7 years, 10 months ago
ISMIS'17 Data Mining Competition: Trading Based on Recommendations
ISMIS 2017 Data Mining Competition is a challenge organized using the KnowledgePit platform at the 23rd International Symposium on Methodologies and Intelligent Systems, held at Warsaw University of Technology, Poland, on June 26-29, 2017. The task is to come up with a strategy for investing in a stock market based on recommendations provided by different experts. The competition is kindly sponsored by mBank S.A. and Tipranks, with a support from ISMIS 2017 organizers.
Topic outline: Financial markets prediction is not an easy task. Plenty of researchers and practitioners have committed a lot of their time and effort trying to come up with a method that would persistently provide profits for investors. Many of them claim that they succeeded and publish their recommendations for different types of assets. The main goal of this competition is to determine whether such recommendations do have a predictive power. We will narrow the problem to the selected number of stocks and analysts. The task is to devise an algorithm that would most accurately predict the class of return from an investment in a stock over the next quarter, basing on historical recommendations related to a particular stock. Here classification seems adequate as we claim that being able to predict whether the return will be positive or negative is a much more appropriate (and perhaps easier) task than trying to guess an exact value.
More details regarding the task and a description of the competition data can be found in the Data description section.
Special Session at ISMIS 2017: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend and submit them for further reviewing process, conducted in the same way as for other special sessions at ISMIS 2017. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted reports. The accepted papers - depending on their scope and the reviewing results - will be included in the main ISMIS 2017 proceedings or in the Industrial Session proceedings.
In case of any questions please post on the competition forum or write us an email at ismis2017-competition@ii.pw.edu.pl
ISMIS'17 Data Mining Competition: Trading Based on Recommendations is over. We would like to sincerely thank all participants for their contribution and support!
We are happy to announce that the competition attracted a total of 159 teams from which 73 were active and submitted at least one solution to the leaderboard. A total number of submissions was 2570 but unfortunately, none of the solutions which were marked as the final exceeded our baseline (which actually tells us a lot about the problem). Thank you for your effort!
The official Winners:
- Mathurin Ache (team mathurin), France
- Łukasz Siemaszko (team bongod), Poland
We have already sent invitations to 10 teams which submitted the most interesting descriptions of their approach to extend their report into a paper for a special session at ISMIS 2017. Those submissions will go through a special peer reviewing track and all accepted papers will be published in conference’s proceedings. All other teams are also welcome to submit extended descriptions of their approach to ISMIS 2017, however, their papers will undergo regular reviewing process by members of conference’s Program Committee and might be shifted to a different session (e.g. the industrial track). The regular paper submission system is available here: https://easychair.org/conferences/?conf=ismis2017
For all of you who would like to continue research related to the competition, we plan to reveal all of the used data (including the labels for the test set). You will be able to find it here within a few weeks.
Data description and format: The data sets for this competition are provided in a tabular format. The training data set, namely ismis17_trainingData.csv, in consecutive lines contains 12,234 records that correspond to recommendations for stock symbols at different points in time. These time points will be referred as decision dates. Each data record is composed of three columns, separated by semicolons. The first column gives an internal identifier of a stock symbol (true symbols are hidden). The second column of a record stores an ordered list recommendations issued by experts for a given stock during two months before the decision date. The third column gives information about the true return class of the stock, computed over the period of three months after the decision date. It may take one of three values: ‘Buy’, ‘Hold’, ‘Sell’, which correspond to considerably positive, close to zero, and considerably negative returns, respectively.
In each record, the list of recommendations consists of one or more tips from financial experts. Any single recommendation is expressed using four values and put between ‘{}’ brackets. The first value is an identifier of an expert. The second value gives a class of the stock predicted by the expert (‘Buy’, ‘Hold’ or ‘Sell’), and the third value expresses expert’s expectations regarding the return rate of the stock in a future. It needs to be stressed that information regarding the expected return rates may sometimes be inconsistent and generally less reliable than the prediction of the rating, due to different interpretations of stock quotes by experts (e.g. not considering splits and/or dividends). Moreover, some experts do not share their expectations about the returns. Such situations are denoted by NA values in the data. The fourth value in each recommendation quantifies a time distance to the decision date (in days), e.g. if this value is 5, it means that the recommendation was published five days before the decision date. The list of recommendations in each record is sorted by the time distances, thus it can be regarded as a time series.
In order to additionally enrich the competition data, we provide a table that groups experts by companies for which they work (the file named company_expert.csv). In total, the data consist of recommendations from 2,832 experts who are employed in 228 different financial institutions.
The test data file, namely ismis17_testData.csv, consists of 7,555 records. It has a similar format as the training data, however, it does not contain the third column with true return classes. The task for participants is to predict the labels for the test cases. It is important to note that the training and test data sets correspond to different time periods and the records in both sets are given in a random order.
The format of submissions: The participants of the competition are asked to predict return classes of the records from the test set and send us their predictions using the submission system. Each solution should be sent in a single text file containing exactly 7,555 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one class label from the set {‘Buy’, ‘Hold’ or ‘Sell’}. Solutions containing any other labels or with a different number of lines will evaluate with an error.
Evaluation of results: The submitted solutions will be evaluated on-line and preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a subset of the test set consisting of 1000 records, fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. Moreover, in order to claim the awards, winners will have to provide source codes that allow reproducing their final solution (in any programming language). All the winners will be officially announced during a special session devoted to this competition, which will be organized at the ISMIS’17 conference (http://ismis2017.ii.pw.edu.pl/).
The assessment of solutions will be done using the accuracy (ACC) measure with an additional cost/reward matrix. For a confusion matrix X, obtained from a vector of predictions preds, and the cost matrix C displayed below, the accuracy is computed as: \[ACC(preds) = \frac{\sum_{i = 1..3} \left( X_{i,i} \cdot C_{i,i} \right) }{\sum_{i = 1..3}\sum_{j = 1..3} \left( X_{i,j} \cdot C_{i,j} \right) }.\]
preds\truth | Buy | Hold | Sell |
Buy | 8 | 4 | 8 |
Hold | 1 | 1 | 1 |
Sell | 8 | 4 | 8 |
For convenience of participants, we provide an exemplary solution file, exemplary_solution.csv, as a reference.
Rank | Team Name | Is Report | Preliminary Score | Final Score | Submissions | |
---|---|---|---|---|---|---|
1 | mathurin |
True | True | 0.4202 | 0.437507 | 2 |
2 | bongod |
True | True | 0.4282 | 0.437461 | 2 |
3 | pwbluehorizon1 |
True | True | 0.4196 | 0.429882 | 2 |
4 | boocheck |
True | True | 0.4515 | 0.427162 | 2 |
5 | a.ruta |
True | True | 0.5348 | 0.424225 | 2 |
6 | mkoz |
True | True | 0.4517 | 0.423419 | 2 |
7 | amy |
True | True | 0.4614 | 0.417240 | 2 |
8 | grzegorzkozlowski |
True | True | 0.4289 | 0.416458 | 2 |
9 | vinh |
True | True | 0.4130 | 0.415950 | 2 |
10 | ptrsbck |
True | True | 0.4341 | 0.412528 | 2 |
11 | duxuhao |
True | True | 0.4568 | 0.409194 | 2 |
12 | koziejka |
True | True | 0.4286 | 0.408538 | 2 |
13 | zdravevski |
True | True | 0.4829 | 0.407256 | 2 |
14 | podcheck |
True | True | 0.4463 | 0.407203 | 2 |
15 | optymista |
True | True | 0.5178 | 0.405360 | 2 |
16 | dotjabber |
True | True | 0.4266 | 0.402121 | 2 |
17 | podludek |
True | True | 0.4385 | 0.399513 | 2 |
18 | katarzyna |
True | True | 0.4451 | 0.397411 | 2 |
19 | atanask |
True | True | 0.4289 | 0.395662 | 2 |
20 | dk |
True | True | 0.4452 | 0.391991 | 2 |
21 | 0bartek |
True | True | 0.4164 | 0.388849 | 2 |
22 | mkadlof |
True | True | 0.4114 | 0.388371 | 2 |
23 | dymitrruta |
True | True | 0.6015 | 0.386997 | 2 |
24 | lusiek |
True | True | 0.4266 | 0.381812 | 2 |
25 | guppi |
True | True | 0.4003 | 0.000000 | 2 |
26 | hieuvq |
False | True | 0.4684 | No report file found or report rejected. | 2 |
27 | parcel |
False | True | 0.4639 | No report file found or report rejected. | 2 |
28 | ps319383 |
False | True | 0.4824 | No report file found or report rejected. | 2 |
29 | butter |
False | True | 0.4501 | No report file found or report rejected. | 2 |
30 | adampap |
False | True | 0.4546 | No report file found or report rejected. | 2 |
31 | mg320637 |
False | True | 0.4397 | No report file found or report rejected. | 2 |
32 | ckarats |
False | True | 0.4376 | No report file found or report rejected. | 2 |
33 | nitekna |
False | True | 0.4611 | No report file found or report rejected. | 2 |
34 | juggler |
False | True | 0.4301 | No report file found or report rejected. | 2 |
35 | kneefer1 |
False | True | 0.4300 | No report file found or report rejected. | 2 |
36 | michalm |
False | True | 0.4282 | No report file found or report rejected. | 2 |
37 | baseline_solution |
False | True | 0.4266 | No report file found or report rejected. | 2 |
38 | mkal |
False | True | 0.4266 | No report file found or report rejected. | 2 |
39 | wniemkowski |
False | True | 0.4241 | No report file found or report rejected. | 2 |
40 | btw |
False | True | 0.4310 | No report file found or report rejected. | 2 |
41 | hj |
False | True | 0.4173 | No report file found or report rejected. | 2 |
42 | kneefer |
False | True | 0.4157 | No report file found or report rejected. | 2 |
43 | contestant1 |
False | True | 0.4141 | No report file found or report rejected. | 2 |
44 | minio7 |
False | True | 0.4359 | No report file found or report rejected. | 2 |
45 | saikatroy |
False | True | 0.4131 | No report file found or report rejected. | 2 |
46 | krishnateja614 |
False | True | 0.4126 | No report file found or report rejected. | 2 |
47 | marek1991 |
False | True | 0.4107 | No report file found or report rejected. | 2 |
48 | sslim |
False | True | 0.4155 | No report file found or report rejected. | 2 |
49 | mateuszk |
False | True | 0.4085 | No report file found or report rejected. | 2 |
50 | akar |
False | True | 0.4371 | No report file found or report rejected. | 2 |
51 | obus |
False | True | 0.4266 | No report file found or report rejected. | 2 |
52 | mkozlow |
False | True | 0.4068 | No report file found or report rejected. | 2 |
53 | mifdal84 |
False | True | 0.4065 | No report file found or report rejected. | 2 |
54 | fajri91 |
False | True | 0.4060 | No report file found or report rejected. | 2 |
55 | ternaus |
False | True | 0.4368 | No report file found or report rejected. | 2 |
56 | lameski |
False | True | 0.4019 | No report file found or report rejected. | 2 |
57 | kp |
False | True | 0.4095 | No report file found or report rejected. | 2 |
58 | marugari |
False | True | 0.3984 | No report file found or report rejected. | 2 |
59 | lupus |
False | True | 0.3984 | No report file found or report rejected. | 2 |
60 | janismdhanbad |
False | True | 0.4085 | No report file found or report rejected. | 2 |
61 | basakesin |
False | True | 0.3972 | No report file found or report rejected. | 2 |
62 | arcane27 |
False | True | 0.3950 | No report file found or report rejected. | 2 |
63 | pameladt |
False | True | 0.4071 | No report file found or report rejected. | 2 |
64 | nxgtr |
False | True | 0.3937 | No report file found or report rejected. | 2 |
65 | flac |
False | True | 0.3936 | No report file found or report rejected. | 2 |
66 | sebastianmusial |
False | True | 0.3936 | No report file found or report rejected. | 2 |
67 | bearstrikesback |
False | True | 0.3923 | No report file found or report rejected. | 2 |
68 | zagorecki |
False | True | 0.3981 | No report file found or report rejected. | 2 |
69 | vnu_jaist2015 |
False | True | 0.0000 | No report file found or report rejected. | 2 |
70 | vinayakumar |
False | True | 0.3956 | No report file found or report rejected. | 2 |
71 | datageek |
False | True | 0.3984 | No report file found or report rejected. | 2 |
72 | saschapojot |
False | True | 0.0000 | No report file found or report rejected. | 2 |
73 | jakub |
False | True | 0.3959 | No report file found or report rejected. | 2 |
- November 22, 2016: start of the competition; data sets become available,
- January 22, 2017 (23:59 GMT): deadline for submitting the predictions and the reports (the deadline for submitting reports has been extended until January 27),
- January 29 February 1, 2017: end of the challenge, on-line publication of final results, sending invitations for submitting papers for the special session at ISMIS 2017,
- late February 2017: deadline for submitting papers describing the selected solutions to the special session at ISMIS 2017.
The teams of the first two top-ranked solutions (based on the final evaluation scores, taking into account solutions satisfying terms and conditions of ISMIS 2017 Data Mining Competition) will be awarded prizes funded by our sponsors:
- First Prize: 1000 USD + one free ISMIS 2017 conference registration,
- Second Prize: 500 USD + one free ISMIS 2017 conference registration.
The award ceremony will take place during the ISMIS 2017 conference (June 26-29, 2017, Warsaw, Poland).
Andrzej Janusz, University of Warsaw - Chair
Kamil Żbikowski, Warsaw University of Technology & mBank S.A. - Chair
Piotr Gawrysiak, Warsaw University of Technology & mBank S.A.
Marzena Kryszkiewicz, Warsaw University of Technology
Henryk Rybiński, Warsaw University of Technology
Dominik Ślęzak, University of Warsaw & Infobright
Discussion | Author | Replies | Last post | |
---|---|---|---|---|
publication test set labels | Andrzej | 0 | by Andrzej Sunday, February 26, 2017, 11:39:18 |
|
For the future ( continued ) | Marek | 0 | by Marek Thursday, February 02, 2017, 14:03:54 |
|
The final results were published | Andrzej | 0 | by Andrzej Wednesday, February 01, 2017, 20:49:31 |
|
The final results were published | Andrzej | 1 | by Dymitr Wednesday, February 01, 2017, 21:22:08 |
|
For the future | Marek | 1 | by Andrzej Wednesday, February 01, 2017, 20:45:06 |
|
Extended deadline | Łukasz | 0 | by Łukasz Wednesday, January 25, 2017, 09:07:42 |
|
Report sending deadline extended | Andrzej | 0 | by Andrzej Tuesday, January 24, 2017, 16:13:47 |
|
problems with the evaluation system are fixed | Andrzej | 0 | by Andrzej Friday, January 20, 2017, 08:39:46 |
|
Did not see score appears in the submission board | Marek | 2 | by Andrzej Friday, January 20, 2017, 08:33:47 |
|
start of the last week | Andrzej | 0 | by Andrzej Monday, January 16, 2017, 15:40:37 |
|
the final week | Andrzej | 0 | by Andrzej Monday, January 16, 2017, 14:54:32 |
|
Confuxion Matrix and Cost Matrix | Raji | 1 | by Andrzej Monday, January 16, 2017, 12:06:40 |
|
ACC reference implementation | Alexey | 0 | by Alexey Friday, January 13, 2017, 14:39:01 |
|
Where is my submission | Quang-Vinh | 1 | by Quang-Vinh Friday, January 13, 2017, 11:35:11 |
|
terms and schedule | Przemek | 1 | by Przemek Thursday, January 12, 2017, 08:15:18 |
|
Happy New Year 2017! | Marek | 1 | by Andrzej Sunday, January 01, 2017, 14:43:32 |
|
Third column of the training set | Christos | 1 | by Andrzej Thursday, December 29, 2016, 08:47:27 |
|
How to form a team | janpreet | 3 | by Andrzej Wednesday, December 21, 2016, 08:31:19 |
|
How to delete the submitted results | vinayakumar | 1 | by vinayakumar Saturday, December 03, 2016, 05:06:04 |
|
Third value of a recommendation | Christos | 1 | by Andrzej Thursday, December 01, 2016, 09:19:11 |
|
Timing of Training and Test Sets | Dymitr | 1 | by Dymitr Sunday, November 27, 2016, 05:25:02 |
|
Welcome to ISMIS 2017 Data Mining Competition! | Andrzej | 0 | by Andrzej Tuesday, November 22, 2016, 17:47:16 |
|
ISMIS 2017 Data Mining Competition | Andrzej | 0 | by Andrzej Tuesday, November 22, 2016, 14:46:46 |