2 years, 4 months ago

## IEEE BigData 2020 Cup: Predicting Escalations in Customer Support

### Predicting Escalations in Customer Support is a data mining challenge organized in association with the IEEE BigData 2020 conference. The task is to predict which cases in Information Builders, Inc. (ibi) technical support ticketing system will be escalated in the nearest future by customers. The competition is organized jointly by ibi (https://www.ibi.com) and QED Software (http://www.qed.pl/).

Technical Support Representatives of Information Builders, Inc. (bi) strive to provide the highest quality level of support to their customers. At times, we may encounter situations where our support process and the needs of our customers conflict. When this occurs, undoubtedly, an escalation will arise. Every escalation is very disruptive to the support process. It changes the day to day activities of Technical Support Representatives, and more importantly, we have an upset customer. The ability to predict when an escalation may arise will allow us to react and do what’s possible to prevent an escalation, diffuse a potential problem, thus maintaining customer satisfaction. We should be able to predict “when” an escalation occurs, it is also equally important to predict why an escalation is going to arise – is it due to a production outage, duration, technical proficiency, project deadlines or other issues. Depending upon the type of escalation, we will be able to build differing support processes that can be best suited to prevent an escalation.

This competition – aiming at building models that predict whether particular customer success cases are going to escalate in future based on information about their up-to-now history – is an important step for ibi to provide their customers with better services relying on modern machine learning solutions.

More details regarding the task and the description of the challenge data set can be found in the Task description section.

Special track at IEEE BigData 2020: A special session devoted to the challenge will be held at the IEEE BigData 2020 conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report.

IEEE BigData 2020 Cup: Predicting Escalations in Customer Support has finished. We are happy to announce that the winners of the competition are Peter Klimov and Vladimir Funtikov from the team Team!

This year, our challenge attracted a total of 254 teams which, in total, submitted over 1050 solutions. Thanks for your contribution!

We would like to thank everyone for participating. In particular, we want to express our gratitude to all teams who decided to send us descriptions of their solutions. Shortly, we will be sending invitations to selected teams to extend their reports and submit conference papers for a special session at IEEE Big Data Conference.

Rank Team Name Is Report Preliminary Score Final Score Submissions
1
Team
True 0.0710 0.046300 65
2
competition baseline
True 0.0415 0.039400 13
3
Debojit Mandal
True 0.0417 0.035300 100
4
shubham
True 0.0330 0.029500 54
5
sunsoul
True 0.0239 0.028600 100
6
Emi
True 0.0487 0.028100 68
7
Chopin
True 0.0428 0.011300 63
8
AMC_JTJ
True 0.0318 0.004700 47
9
victorkras2008
True 0.0000 0.000000 1
10
hieuvq
True 0.0811 -0.019900 83
11
Turing Insight
True 0.0000 -0.103800 36
12
paranoid_android
False 0.0396 No report file found or report rejected. 7
13
PP
False 0.0318 No report file found or report rejected. 63
14
sna
False 0.0308 No report file found or report rejected. 3
15
BlackStar
False 0.0263 No report file found or report rejected. 19
16
chashanliu
False 0.0225 No report file found or report rejected. 25
17
BGU
False 0.0113 No report file found or report rejected. 43
18
tdobson
False 0.0100 No report file found or report rejected. 13
19
SoloDance
False 0.0040 No report file found or report rejected. 16
20
РЫЫЫЫЫЫЫЫЫА
False 0.0031 No report file found or report rejected. 32
21
random
False 0.0028 No report file found or report rejected. 21
22
WastedTimes
False 0.0210 No report file found or report rejected. 35
23
testJK
False -999.0000 No report file found or report rejected. 9
24
False 0.0004 No report file found or report rejected. 16
25
Mathurin
False 0.0000 No report file found or report rejected. 2
26
Lalka
False -0.0014 No report file found or report rejected. 2
27
Kirov reporting
False -0.0095 No report file found or report rejected. 14
28
sourabhjha
False -0.0107 No report file found or report rejected. 4
29
Lukazambuca
False -0.0146 No report file found or report rejected. 10
30
mr_doppelpack
False -0.0146 No report file found or report rejected. 1
31
cbuxe
False -0.0146 No report file found or report rejected. 5
32
daniel_kaluza
False -0.0146 No report file found or report rejected. 1
33
8
False -0.0186 No report file found or report rejected. 2
34
Mahmoud Trigui
False -0.0295 No report file found or report rejected. 15
35
False -0.0338 No report file found or report rejected. 4
36
riccardo1350
False 0.0113 No report file found or report rejected. 14
37
One_n_Only
False -0.0395 No report file found or report rejected. 6
38
sssssssssssss
False 0.0279 No report file found or report rejected. 18
39
SupportHelper
False -0.3748 No report file found or report rejected. 6
40
Fluer
False -0.3807 No report file found or report rejected. 9
41
Prachi 12
False -127.4103 No report file found or report rejected. 2

Data sets for this competition were provided by ibi. Data is divided into four main tables which correspond to information stored by a ticketing system of the Customer Service department. Since the data contains sensitive information, it was carefully preprocessed and anonymized to guarantee the safety of ibi's customers and employees.

• IBI_case_metadata_anonymized.csv file contains basic information regarding each case from the available data (training and test examples), such as the name of a person issuing the ticket, his/her company, an ID of a group responsible for handling the case, etc. Most of this information is typically available when a new case is opened in the system. Each case is associated with a unique REFERENCEID which can be used to join records from all available data tables.
• IBI_case_milestones_anonymized.csv file contains data regarding all important events in the history of a case. It can be used to track all activities related to each REFERENCEID in the data, however, for test cases, this activity log is cut at the decision timestamp. A typical case has one-to-many relation with entries from this table.
•  IBI_case_comments_anonymized.csv contains all messages exchanged between a customer and the customer service staff. The texts in natural language were encoded to protect the privacy of IBI's customers and employees. However, to facilitate the use of NLP techniques, we provide an additional file challenge_dictionary_info.csv which stores basic information about the encoded words, such as a POS tag, result of a NER model, counts in the entire data, and information whether a term was present in a standard English dictionary or it was a non-standard term (e.g. link, some special name, filename, etc.). The dictionary stores encoded terms from all available data tables. Similarly to the milestone data table, the comments have a many-to-one relation with the considered cases. For the test cases, the history of comments was cut at the decision timestamp.
• IBI_case_status_history.csv is an automatically generated status log of each case in the data. It stores information regarding case severity status, and additional information whether the case at a given timestamp is marked as escalated. For the convenience of participants of the challenge, we added an auxiliary column to this table, which expresses the inverted time to the nearest escalation for the corresponding case - this value is the prediction target for test cases (and for those cases, it is missing in the data).

Additionally, there is a file IBI_test_cases_no_target.csv, which indicates REFERENCEIDs of test cases, along with the corresponding decision timestamps (i.e. time in seconds since the opening of a case, at which a model needs to make a prediction regarding the time to the nearest escalation of the case).

The task and the format of submissions: the task for participants is to create an efficient model for predicting inverted time to escalation, which is computed as: $$y = \left\{ \begin{array}{ll} 0 & \textrm{if a case was never escalated}\\ \frac{86400}{SECONDS\_TO\_NEXT\_ESCALATE + 86400} & \textrm{otherwise} \end{array} \right.$$ This transformation of the prediction target is required to keep the consistency of the predictions and facilitate the training of models (the term 86400 in the formula corresponds to the number of seconds during 24 hours and is used to scale the target values). In this way, the predicted values should always be in the $[0, 1]$ interval.

The predictions for test instances from the IBI_test_cases_no_target.csv table should be submitted to the online evaluation system as a textual file. The file should have exactly 12724 lines, and each line should contain exactly one number from the $[0, 1]$ interval. The ordering of predictions should be the same as the ordering of instances in the IBI_test_cases_no_target.csv table.

Evaluation: the quality of submissions will be evaluated using the $R^2$ measure, i.e., for each test instance $i$, the prediction will be compared to the ground truth value, and the overall model performance will be evaluated using the formula:

$$R^2 = 1 - \frac{RSS}{TSS},$$ where RSS is the residual sum of squares: $$RSS = \sum_i (y_i - \hat{y_i})^2,$$ and TSS is the total sum of squares: $$TSS = \sum_i (y_i - \bar{y})^2,$$ $\hat{y_i}$ for $i \in \{1, \ldots, \|test\ size\|\}$ are the predictions of the model, and $\bar{y}$ is the mean value of the target variable.

Solutions will be evaluated online and the preliminary results will be published on the public leaderboard. The preliminary score will be computed on a small subset of the test instances (10%), fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published online. It is important to note that only teams that submit a report describing their approach before the end of the challenge will qualify for the final evaluation. Moreover, to be eligible for the awards in this challenge, the winning teams must exceed the score of the baseline solution by at least 10%.

In case of any questions, please post on the competition forum or write an email to contact {at} knowledgepit.ml

• April 8, 2020: web site of the challenge opens, the task is revealed,
• April 30 May 15, 2020: start of the competition, data become available,
• September 14 September 28, 2020 (23:59 GMT): deadline for submitting the solutions,
• September 18 September 30, 2020 (23:59 GMT): deadline for sending the reports, end of the competition,
• September 21 October 3, 2020: online publication of the final results, sending invitations for submitting papers for the special track at the IEEE BigData 2020 conference,
• October 24, 2020: deadline for submitting invited papers,
• November 1, 2020: notification of paper acceptance,
• November 15, 2020: camera-ready of accepted papers due,
• December 10-13, 2020: the IEEE BigData 2020 conference (special track date TBA).

Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by the sponsor:

• First Prize: 2000 USD + one free IEEE BigData 2020 conference registration,
• Second Prize: 750 USD + one free IEEE BigData 2020 conference registration,
• Third Prize: 250 USD + one free IEEE BigData 2020 conference registration.

The award ceremony will take place during the special track at the IEEE BigData 2020 conference.

• Guohua Hao, ibi
• Andrzej Janusz, QED Software & University of Warsaw
• Tony Li, ibi
• Mateusz Przyborowski, QED Software & University of Warsaw
• Eric Raab, ibi
• Dominik Ślęzak, QED Software & University of Warsaw

In case of any questions please post on the competition forum or write an email at contact {at} knowledgepit.ml

This forum is for all users to discuss matters related to the competition. Good manners apply!
Discussion Author Replies Last post
announcement of the competition results Andrzej 0 by Andrzej
Saturday, October 03, 2020, 12:55:22
the end of the competition Andrzej 2 by Andrzej
Wednesday, September 30, 2020, 11:22:30
new baseline and the extension of the competition deadline! Andrzej 6 by Andrzej
Monday, September 28, 2020, 13:39:26
&quot;Duplicate file in Your Team. &quot; Man Hing 1 by Andrzej
Monday, September 21, 2020, 15:03:42
The meaning of IBI_case_milestones_anonymized. CSV 1 by Andrzej
Wednesday, September 16, 2020, 10:40:24
Submission format clarification Timothy 3 by Timothy
Friday, September 04, 2020, 20:54:39
Submission format clarification Timothy 0 by Timothy
Thursday, September 03, 2020, 16:43:11
submission cols Debojit 3 by Debojit
Wednesday, September 02, 2020, 18:59:10
Early baseline source code published Daniel 2 by Anuj
Saturday, August 29, 2020, 20:48:36
New dictionary with translations of some ids Daniel 0 by Daniel
Monday, August 17, 2020, 11:36:59
Submission file status results in &#39;test cases with no predictions&#39; Malsha 1 by Daniel
Monday, August 17, 2020, 11:20:40
who can participate anish 1 by Mateusz
Thursday, August 13, 2020, 15:45:25
The baseline problem FANG JYUN 1 by Mateusz
Thursday, August 13, 2020, 15:08:56
Thursday, July 30, 2020, 09:27:22
Time period Luka 1 by Andrzej
Sunday, July 12, 2020, 22:12:23
Conflict between ISESCALATE and INV_TIME_TO_NEXT_ESCALATION in the status table Dymitr 1 by Andrzej
Wednesday, June 03, 2020, 10:41:53