DSF: Ronan is a new member of the Data Science Foundation and has asked for our members to support the research he is doing for his MSc thesis. I am sure we can help by participating in his survey and sharing our opinions…
For my thesis I am researching various statistical methods and their ability to deliver value for predictive modelling; more specifically ‘churn prediction’. To compare the various models in a complete and honest way I want to evaluate the models based on six performance characteristics.
I am now looking for experts, practitioners, and end users that can rank these six dimensions using the best-worst method.
The survey consists of a few general questions about your current position followed by four questions regarding the dimensions of performance. It should take you around 5 minutes to complete and you would help me a lot because I am looking for responses from people with data science and analytics background.
In the survey, some information is provided about the six dimensions. However, it is not possible to navigate back to this information once you pass a certain point. Because of this I have supplied further information below.
The survey is accessible through the link below: https://qtrial2019q2az1.az1.qualtrics.com/jfe/form/SV_ex7dGwtjXSBBKa9
Please let me know if you have any question or anything is unclear. Also do let me know if you are interested in the results of my research. Thanks a lot, in advance!
Supporting Information: Best-worst method Information document
In the survey, you will be asked to give your opinion on six performance characteristics of predictive modelling. This document serves as a reference to assist you in filling out the survey. As it is not possible to navigate back to the definitions of the six dimensions you can refer to this document at any moment when filling out the survey.
Information about the six dimensions of performance
First, some information about the six dimensions of performance is presented. This research conceptualizes performance as the value that the model can deliver to the various stakeholders that will be using (the results of) the model. Stakeholders involve, but are not limited to; data scientists, marketing managers, C-level executives, and researchers. The dimensions below contain a quick explanation of the concepts to minimize confusion. If any of the concepts is still unclear, please reach out to me on firstname.lastname@example.org and I will try my best to explain it more thorough.
Accuracy refers to the ability of the model to predict the correct value. The accuracy is expressed as a percentage of the examples that have been predicted correctly.
Precision refers to the closeness of the measurements with respect to the observed values. It is measured by dividing the amount of correct predictions (true positives) by the total amount of predicted positive results (true positive + false positive). Precision is independent of accuracy; you could be very precise, but inaccurate.
AUC is known as the Area Under the receiver operating characteristics Curve (ROC). The ROC plots the sensitivity (true positives/ total observed positives) on the Y-axis and the specificity (true negatives/ total observed negatives) on the X-axis. Although the ROC curve provides more information than the AUC, the AUC is very useful if you need one number to summarize the results. This enables researchers to use multiple AUC values to create learning curves or compare results of various models.
Expected value framework is created by assigning a value (cost or revenue) to a certain prediction (e.g. true positive). By multiplying the frequency of all events in the confusion matrix (see picture below) with their corresponding value the total costs or benefits of the model can be calculated. It can be used to compare the created model with some benchmarks such as, for example, the hand crafted model created by the marketing department.
Easy to interpret refers to the ability of the data scientist or manager to interpret the results, explain how the model came to a decision, and work with the results of the model.
Fastness of the model is about the time required to build a model, train a model. It also takes into account the processing power involved to run the model.