Predicting Escalations in Customer Support

Technical Support Representatives of Information Builders, Inc. (bi) strive to provide the highest quality level of support to their customers. At times, we may encounter situations where our support process and the needs of our customers conflict. When this occurs, undoubtedly, an escalation will arise. Every escalation is very disruptive to the support process. It changes the day to day activities of Technical Support Representatives, and more importantly, we have an upset customer. The ability to predict when an escalation may arise will allow us to react and do what’s possible to prevent an escalation, diffuse a potential problem, thus maintaining customer satisfaction. We should be able to predict “when” an escalation occurs, it is also equally important to predict why an escalation is going to arise – is it due to a production outage, duration, technical proficiency, project deadlines or other issues. Depending upon the type of escalation, we will be able to build differing support processes that can be best suited to prevent an escalation.

This competition – aiming at building models that predict whether particular customer success cases are going to escalate in future based on information about their up-to-now history – is an important step for ibi to provide their customers with better services relying on modern machine learning solutions.

More details regarding the task and the description of the challenge data set can be found in the Task description section.

Special track at IEEE BigData 2020: A special session devoted to the challenge will be held at the IEEE BigData 2020 conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report.

Terms & Conditions

Contest Participation Rules:

Competition organizers are Information Builders, Inc. (ibi), and QED Software sp. z o.o.
The competition is open to all interested researchers, specialists, and students. Only members of the Contest Organizing Committee and employees of Information Builders, Inc., and QED Software cannot participate.
Participants may submit solutions as teams made up of one or more persons.
The deadline for submitting the solutions is ~~September 14~~ September 28, 2020 (23:59 GMT).
Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team.
One KnowledgePit account can only be associated with a single team at a time. It is not possible to withdraw from a team, but teams can be merged.
Each team needs to be composed of a different set of persons.
A single person may enroll in the challenge with only one KnowledgePit account.
Each team is obliged to provide a short report describing their final solution. The report must contain information such as the name of the team, the names of all team members, and a brief overview of the used approach. The description should explain all data preprocessing steps and model construction steps. It should be submitted in the pdf format using our submission system by ~~September 18~~ September 30, 2020 (23:59 GMT). Only submissions made by teams that provided the reports will qualify for the final evaluation.
After the final evaluation, three top-ranked teams will be asked to provide the source codes that can be used to reproduce their final solutions and documentation that would allow running the code. If the code has to be run within a complex environment (e.g. distributed Hadoop cluster) a detailed setup explanation should be provided as well. The source codes will be used to verify the legitimacy of solutions. Winners of the challenge are chosen from the top-ranked teams that provide reports and the legitimate source codes of their solutions for the verification. Only such teams are eligible for the awards in this challenge.
Additionally, in this challenge, the winners are eligible for money prizes only if their final solution improves the baseline score by at least 10%.
The fact of accepting the award is equivalent to granting the organizers a worldwide, non-exclusive, sub-licensable, transferable, royalty-free, perpetual and irrevocable right to use, reproduce, distribute, create derivative works of, publicly perform, publicly display, digitally perform, make, have made, sell, offer for sale and/or import the winning submission and the source code used to generate it, in any media now known or hereafter developed, for any purpose whatsoever, commercial or otherwise, without further approval or any payment to the participant. By accepting the award the participants also acknowledge that they have full and unrestricted rights to grant aforementioned rights.
The fact of accepting the award is equivalent to allowing for usage of participant's name, affiliation and/or prize information by competition organizers for promotional purposes in any medium without additional compensation.
Organizers hold the right to extend the deadlines for submitting solutions and/or reports. In such a case, they will inform participants about the change using the competition forum.
Organizers are not responsible for any consequences of technical issues related to the evaluation system or the competition platform.
The final ranking of the competing teams will be done based on the final evaluation results. In a case of draws in the evaluation scores, the time of the submission will be taken into account.
Each report, paper, and any other type of publication basing on the research where data from this competition is used should accredit KnowledgePit, QED Software, and Information Builders, Inc. as the institutions that provided data for the study.
Organizers may reject any submission if they suspect that it was produced in an unfair way (e.g., used unintended data leaks) or was submitted by a team that has broken the competition rules without providing any additional explanation.
By enrolling in this competition, you grant the organizers the right to process your submissions and reports for the purpose of evaluation and post-competition research. Your data is administrated by Esensei Sp. z o.o.

IEEE BigData 2020 Cup: Predicting Escalations in Customer Support has finished. We are happy to announce that the winners of the competition are Peter Klimov and Vladimir Funtikov from the team Team!

This year, our challenge attracted a total of 254 teams which, in total, submitted over 1050 solutions. Thanks for your contribution!

We would like to thank everyone for participating. In particular, we want to express our gratitude to all teams who decided to send us descriptions of their solutions. Shortly, we will be sending invitations to selected teams to extend their reports and submit conference papers for a special session at IEEE Big Data Conference.

Rank	Team Name	Is Report		Preliminary Score	Final Score	Submissions
1	Team	True	True	0.0710	0.046300	65
2	competition baseline	True	True	0.0415	0.039400	13
3	Debojit Mandal	True	True	0.0417	0.035300	100
4	shubham	True	True	0.0330	0.029500	54
5	sunsoul	True	True	0.0239	0.028600	100
6	Emi	True	True	0.0487	0.028100	68
7	Chopin	True	True	0.0428	0.011300	63
8	AMC_JTJ	True	True	0.0318	0.004700	47
9	victorkras2008	True	True	0.0000	0.000000	1
10	hieuvq	True	True	0.0811	-0.019900	83
11	Turing Insight	True	True	0.0000	-0.103800	36
12	paranoid_android	False	True	0.0396	No report file found or report rejected.	7
13	PP	False	True	0.0318	No report file found or report rejected.	63
14	sna	False	True	0.0308	No report file found or report rejected.	3
15	BlackStar	False	True	0.0263	No report file found or report rejected.	19
16	chashanliu	False	True	0.0225	No report file found or report rejected.	25
17	BGU	False	True	0.0113	No report file found or report rejected.	43
18	tdobson	False	True	0.0100	No report file found or report rejected.	13
19	SoloDance	False	True	0.0040	No report file found or report rejected.	16
20	РЫЫЫЫЫЫЫЫЫА	False	True	0.0031	No report file found or report rejected.	32
21	random	False	True	0.0028	No report file found or report rejected.	21
22	WastedTimes	False	True	0.0210	No report file found or report rejected.	35
23	testJK	False	False	-999.0000	No report file found or report rejected.	9
24	VLADISLAV	False	True	0.0004	No report file found or report rejected.	16
25	Mathurin	False	True	0.0000	No report file found or report rejected.	2
26	Lalka	False	True	-0.0014	No report file found or report rejected.	2
27	Kirov reporting	False	True	-0.0095	No report file found or report rejected.	14
28	sourabhjha	False	True	-0.0107	No report file found or report rejected.	4
29	Lukazambuca	False	True	-0.0146	No report file found or report rejected.	10
30	mr_doppelpack	False	True	-0.0146	No report file found or report rejected.	1
31	cbuxe	False	True	-0.0146	No report file found or report rejected.	5
32	daniel_kaluza	False	True	-0.0146	No report file found or report rejected.	1
33	8	False	True	-0.0186	No report file found or report rejected.	2
34	Mahmoud Trigui	False	True	-0.0295	No report file found or report rejected.	15
35	admin	False	True	-0.0338	No report file found or report rejected.	4
36	riccardo1350	False	True	0.0113	No report file found or report rejected.	14
37	One_n_Only	False	True	-0.0395	No report file found or report rejected.	6
38	sssssssssssss	False	True	0.0279	No report file found or report rejected.	18
39	SupportHelper	False	True	-0.3748	No report file found or report rejected.	6
40	Fluer	False	True	-0.3807	No report file found or report rejected.	9
41	Prachi 12	False	True	-127.4103	No report file found or report rejected.	2

Please log in to the system!

Data sets for this competition were provided by ibi. Data is divided into four main tables which correspond to information stored by a ticketing system of the Customer Service department. Since the data contains sensitive information, it was carefully preprocessed and anonymized to guarantee the safety of ibi's customers and employees.

IBI_case_metadata_anonymized.csv file contains basic information regarding each case from the available data (training and test examples), such as the name of a person issuing the ticket, his/her company, an ID of a group responsible for handling the case, etc. Most of this information is typically available when a new case is opened in the system. Each case is associated with a unique REFERENCEID which can be used to join records from all available data tables.
IBI_case_milestones_anonymized.csv file contains data regarding all important events in the history of a case. It can be used to track all activities related to each REFERENCEID in the data, however, for test cases, this activity log is cut at the decision timestamp. A typical case has one-to-many relation with entries from this table.
IBI_case_comments_anonymized.csv contains all messages exchanged between a customer and the customer service staff. The texts in natural language were encoded to protect the privacy of IBI's customers and employees. However, to facilitate the use of NLP techniques, we provide an additional file challenge_dictionary_info.csv which stores basic information about the encoded words, such as a POS tag, result of a NER model, counts in the entire data, and information whether a term was present in a standard English dictionary or it was a non-standard term (e.g. link, some special name, filename, etc.). The dictionary stores encoded terms from all available data tables. Similarly to the milestone data table, the comments have a many-to-one relation with the considered cases. For the test cases, the history of comments was cut at the decision timestamp.
IBI_case_status_history.csv is an automatically generated status log of each case in the data. It stores information regarding case severity status, and additional information whether the case at a given timestamp is marked as escalated. For the convenience of participants of the challenge, we added an auxiliary column to this table, which expresses the inverted time to the nearest escalation for the corresponding case - this value is the prediction target for test cases (and for those cases, it is missing in the data).

Additionally, there is a file IBI_test_cases_no_target.csv, which indicates REFERENCEIDs of test cases, along with the corresponding decision timestamps (i.e. time in seconds since the opening of a case, at which a model needs to make a prediction regarding the time to the nearest escalation of the case).

The task and the format of submissions: the task for participants is to create an efficient model for predicting inverted time to escalation, which is computed as: $$ y = \left\{ \begin{array}{ll} 0 & \textrm{if a case was never escalated}\\ \frac{86400}{SECONDS\_TO\_NEXT\_ESCALATE + 86400} & \textrm{otherwise} \end{array} \right. $$ This transformation of the prediction target is required to keep the consistency of the predictions and facilitate the training of models (the term 86400 in the formula corresponds to the number of seconds during 24 hours and is used to scale the target values). In this way, the predicted values should always be in the $[0, 1]$ interval.

The predictions for test instances from the IBI_test_cases_no_target.csv table should be submitted to the online evaluation system as a textual file. The file should have exactly 12724 lines, and each line should contain exactly one number from the $[0, 1]$ interval. The ordering of predictions should be the same as the ordering of instances in the IBI_test_cases_no_target.csv table.

Evaluation: the quality of submissions will be evaluated using the $R^2$ measure, i.e., for each test instance $i$, the prediction will be compared to the ground truth value, and the overall model performance will be evaluated using the formula:

$$R^2 = 1 - \frac{RSS}{TSS},$$ where RSS is the residual sum of squares: $$RSS = \sum_i (y_i - \hat{y_i})^2,$$ and TSS is the total sum of squares: $$TSS = \sum_i (y_i - \bar{y})^2,$$ $\hat{y_i}$ for $i \in \{1, \ldots, \|test\ size\|\}$ are the predictions of the model, and $\bar{y}$ is the mean value of the target variable.

Solutions will be evaluated online and the preliminary results will be published on the public leaderboard. The preliminary score will be computed on a small subset of the test instances (10%), fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published online. It is important to note that only teams that submit a report describing their approach before the end of the challenge will qualify for the final evaluation. Moreover, to be eligible for the awards in this challenge, the winning teams must exceed the score of the baseline solution by at least 10%.

In case of any questions, please post on the competition forum or write an email to contact {at} knowledgepit.ml

April 8, 2020: web site of the challenge opens, the task is revealed,
~~April 30~~ May 15, 2020: start of the competition, data become available,
~~September 14~~ September 28, 2020 (23:59 GMT): deadline for submitting the solutions,
~~September 18~~ September 30, 2020 (23:59 GMT): deadline for sending the reports, end of the competition,
~~September 21~~ October 3, 2020: online publication of the final results, sending invitations for submitting papers for the special track at the IEEE BigData 2020 conference,
October 24, 2020: deadline for submitting invited papers,
November 1, 2020: notification of paper acceptance,
November 15, 2020: camera-ready of accepted papers due,
December 10-13, 2020: the IEEE BigData 2020 conference (special track date TBA).

Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by the sponsor:

First Prize: 2000 USD + one free IEEE BigData 2020 conference registration,
Second Prize: 750 USD + one free IEEE BigData 2020 conference registration,
Third Prize: 250 USD + one free IEEE BigData 2020 conference registration.

The award ceremony will take place during the special track at the IEEE BigData 2020 conference.

Guohua Hao, ibi
Andrzej Janusz, QED Software & University of Warsaw
Tony Li, ibi
Mateusz Przyborowski, QED Software & University of Warsaw
Eric Raab, ibi
Dominik Ślęzak, QED Software & University of Warsaw

In case of any questions please post on the competition forum or write an email at contact {at} knowledgepit.ml

This forum is for all users to discuss matters related to the competition. Good manners apply!

Discussion	Author	Replies	Last post
announcement of the competition results	Andrzej	0	by Andrzej Saturday, October 03, 2020, 10:55:22
the end of the competition	Andrzej	2	by Andrzej Wednesday, September 30, 2020, 09:22:30
new baseline and the extension of the competition deadline!	Andrzej	6	by Andrzej Monday, September 28, 2020, 11:39:26
"Duplicate file in Your Team. "	Man Hing	1	by Andrzej Monday, September 21, 2020, 13:03:42
The meaning of IBI_case_milestones_anonymized. CSV		1	by Andrzej Wednesday, September 16, 2020, 08:40:24
Submission format clarification	Timothy	3	by Timothy Friday, September 04, 2020, 18:54:39
Submission format clarification	Timothy	0	by Timothy Thursday, September 03, 2020, 14:43:11
submission cols	Debojit	3	by Debojit Wednesday, September 02, 2020, 16:59:10
Early baseline source code published	Daniel	2	by Anuj Saturday, August 29, 2020, 18:48:36
New dictionary with translations of some ids	Daniel	0	by Daniel Monday, August 17, 2020, 09:36:59
Submission file status results in 'test cases with no predictions'	Malsha	1	by Daniel Monday, August 17, 2020, 09:20:40
who can participate	anish	1	by Mateusz Thursday, August 13, 2020, 13:45:25
The baseline problem	FANG JYUN	1	by Mateusz Thursday, August 13, 2020, 13:08:56
Unable to reset the password using forgot password link.	raviteja	1	by Andrzej Thursday, July 30, 2020, 07:27:22
Time period	Luka	1	by Andrzej Sunday, July 12, 2020, 20:12:23
Conflict between ISESCALATE and INV_TIME_TO_NEXT_ESCALATION in the status table	Dymitr	1	by Andrzej Wednesday, June 03, 2020, 08:41:53
I can't download some data files.	지선	1	by Andrzej Monday, June 01, 2020, 10:28:30
The submission system is online!	Andrzej	1	by Andrzej Monday, May 25, 2020, 18:05:35
Evaluation metric	Henry	4	by Andrzej Saturday, May 23, 2020, 11:37:03
baseline problem	Man Hing	1	by Andrzej Thursday, May 21, 2020, 09:52:24
cannot download some data files	Man Hing	1	by Andrzej Monday, May 18, 2020, 13:56:26
The competition has started!	Andrzej	2	by Andrzej Monday, May 18, 2020, 13:55:22
Start of the competition is postponed	Andrzej	0	by Andrzej Thursday, April 30, 2020, 19:13:22