9 years, 4 months ago

IJCRS'15 Data Challenge: Mining Data from Coal Mines

IJCRS'15 Data Challenge: Mining Data from Coal Mines is a competition organized within a frame of The 2015 International Joint Conference on Rough Sets (IJCRS'15). It continues the tradition of data mining challenges associated with rough set conferences. This time, the task is related to the problem of monitoring and prediction of dangerous concentrations of methane in longwalls of a Polish coal mine. The competition is sponsored by Research and Development Centre EMAG (https://ibemag.pl) with support from International Rough Set Society.

Overview

Coal mining requires working in hazardous conditions. Miners in an underground coal mine can face several threats, such as, e.g. methane explosions or rock-burst. To provide protection for people working underground, systems for active monitoring of a production processes are typically used. One of their fundamental applications is screening dangerous gas concentrations (methane in particular) in order to prevent spontaneous explosions [1]. Therefore, for that purpose the ability to predict dangerous concentrations of gases in the nearest future can be even more important then monitoring the current sensor readings [2].

We would like to address this particular problem in IJCRS'15 Data Challenge: Mining Data from Coal Mines. More details regarding the task and a description of the competition data can be found in Task Description section.

Special session at IJCRS'15: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be treated as regular papers. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted report.

In case of any questions please post on the competition forum or write us an email: webmaster@knowledgepit.fedcsis.org

References:

  1. M. Kozielski, A. Skowron, Ł. Wróbel, M. Sikora: “Regression Rule Learning for Methane Forecasting in Coal Mines“, Beyond Databases, Architectures, and Structures, CCIS, Vol. 521, Springer International Publishing, pp. 495-504, 2015
  2. A. Krasuski, A. Jankowski, A. Skowron, and D. Ślęzak: “From sensory data to decision making: A perspective on supporting a fire commander”, in Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 3. IEEE, 2013, pp. 229–236
  3. H.S. Nguyen: “On Efficient Handling of Continuous Attributes in Large Data Bases”, Fundamenta Informaticae, 48(1):61–81, 2001
  4. J.W. Grzymala-Busse: “A New Version of the Rule Induction System LERS”, Fundamenta Informaticae, 31, pp. 27-39, 1997
  5. L.S. Riza, A. Janusz, C. Bergmeir, C. Cornelis, F. Herrera, D. Ślęzak, and J.M. Benítez: “Implementing Algorithms of Rough Set Theory and Fuzzy Rough Set Theory in the R Package ’RoughSets’”, Information Sciences, 287(0):68–89, 2014
Terms & Conditions
 
 

IJCRS'15 Data Challenge: Mining Data from Coal Mines has ended and we are proud to announce the winners:

  1. Adam Zagorecki (team zagorecki) from Cranfield University, United Kingdom
  2. Marc Boulle (team marcb) from Orange Labs, France
  3. Dymitr Ruta (team dymitrruta) from EBTIC, Khalifa University, United Arab Emirates

Congratulations!

All competition data were made available in the Data files folder (including the labels for the test set, as well as indexes of objects from the preliminary evaluation set). The data set is free to use for non-commercial purposes, however, if you decide to use it in your post-competition research, please add references to related papers describing the scope of the challenge and a link to Knowledge Pit platform. Below is an exemplary list of related papers (it will be extended):

  • Janusz, A., Sikora, M., Wróbel, Ł., Stawicki, S., Grzegorowski, M., Wojtas, P., Ślęzak, D.: Mining Data from Coal Mines: IJCRS’15 Data Challenge. In: Proceedings of RSFDGrC 2015: 429-438, LNAI, vol. 9437. Springer (2015)
  • Janusz, A., Sikora, M., Wróbel, Ł., Ślęzak, D.: Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In: Proceedings of FedCSIS 2016, IEEE (In print September 2016)

To access the competition data you need to be logged in. If you still haven't registered at Knowledge Pit, you may create an account using this link: https://knowledgepit.fedcsis.org/login/signup.php?

Rank Team Name Is Report   Preliminary Score Final Score Submissions
1
zagorecki
True True 0.9666 0.959267 2
2
mgrzegorowski
True True 0.9327 0.947334 2
3
marcb
True True 0.9479 0.943929 2
4
dymitrruta
True True 0.9487 0.943699 2
5
kkurach_kp7
True True 0.9685 0.940024 2
6
max25
True True 0.9603 0.937775 2
7
kkurach
True True 0.9591 0.936714 2
8
nitekna
True True 0.9255 0.935814 2
9
seba91
True True 0.9495 0.934909 2
10
archie2
True True 0.9350 0.931817 2
11
wds
True True 0.9460 0.931256 2
12
tdziopa
True True 0.9288 0.925846 2
13
katarzynki
True True 0.9269 0.917891 2
14
fzero
True True 0.6469 0.581214 2
15
ayerdi
True True 0.5954 0.580494 2
16
toczacypaczek
False True 0.9484 No report file found or report rejected. 2
17
trzewior
False True 0.9469 No report file found or report rejected. 2
18
weczer
False True 0.9460 No report file found or report rejected. 2
19
artyr
False True 0.9430 No report file found or report rejected. 2
20
auroree
False True 0.9417 No report file found or report rejected. 2
21
tmonq
False True 0.9395 No report file found or report rejected. 2
22
krecik
False True 0.9395 No report file found or report rejected. 2
23
lab
False True 0.9386 No report file found or report rejected. 2
24
krecik1
False True 0.9377 No report file found or report rejected. 2
25
tomabar728
False True 0.9367 No report file found or report rejected. 2
26
mavax
False True 0.9363 No report file found or report rejected. 2
27
zaggy
False True 0.9361 No report file found or report rejected. 2
28
adrzeniek
False True 0.9356 No report file found or report rejected. 2
29
grzywna
False True 0.9351 No report file found or report rejected. 2
30
archie
False True 0.9347 No report file found or report rejected. 2
31
kebab48
False True 0.9327 No report file found or report rejected. 2
32
annaokon
False True 0.9319 No report file found or report rejected. 2
33
pbombik
False True 0.9318 No report file found or report rejected. 2
34
mateusz
False True 0.9314 No report file found or report rejected. 2
35
krzysiek91
False True 0.9313 No report file found or report rejected. 2
36
nikusiaczek
False True 0.9306 No report file found or report rejected. 2
37
leo
False True 0.9301 No report file found or report rejected. 2
38
buf
False True 0.9299 No report file found or report rejected. 2
39
moomean
False True 0.9286 No report file found or report rejected. 2
40
artukoz021
False True 0.9316 No report file found or report rejected. 2
41
pat_sc
False True 0.9233 No report file found or report rejected. 2
42
alesew8368
False True 0.9192 No report file found or report rejected. 2
43
agabrys
False True 0.9039 No report file found or report rejected. 2
44
mateo081
False True 0.9007 No report file found or report rejected. 2
45
bfrackowiak
False True 0.9001 No report file found or report rejected. 2
46
baseline_solution
False True 0.8930 No report file found or report rejected. 2
47
kp7
False True 0.8631 No report file found or report rejected. 2
48
seba92
False True 0.8499 No report file found or report rejected. 2
49
sohrab
False True 0.8119 No report file found or report rejected. 2
50
fzero2
False True 0.6458 No report file found or report rejected. 2
51
reksio
False True 0.5002 No report file found or report rejected. 2

Data format: The time series data sets for this competition are provided in a tabular format. For a convenience of participants the training data set was divided into five smaller chunks, namely trainingData1.csv, ..., trainingData5.csv. Those files were compressed into a single archive trainingData.7z and can be dowloaded from the Data files section after successful enrollment to the competition. In total, the files contain sensor readings for 51,700 time periods, each 10 minutes long, with measurements taken every second (600 values for every sensor in a single series). Values for each time period are stored in a different row of the data. The data include readings from 28 different sensors thus, every row in the data consists of 16,800 values stored in consecutive columns and separated by commas. Names of the data columns, which allow to identify sensor names, are provided in a separate file, namely column_names.txt. Descriptions of the types of sensors used from monitoring the mining process are given in sensor_descriptions.txt and their placement in corridors of the mine is indicated on the provided mining process scheme (mining_process_scheme.png). The time periods in the training data are overlapping and are given in a chronological order.

Labels in the data indicate whether a warning threshold has been reached in a period between three and six minutes after the end of the training period, for three methane meters: MM263, MM264 and MM256. In particular, if a given row corresponds to a period between \(t_{-599}\) and \(t_{0}\), then the label for a methane meter MM in this row is 'warning' if and only if \(max(MM(t_{181}), ..., MM(t_{360})) \geq 1.0\). The labels for the training data are provided in separate files, trainingLabels.7z. The test data file, testData.7z, is in the same format as the training data set, however, the labels for the test series are hidden from participants. It is important to note that time periods in the test data do not overlap and they are given in a randomize order.

Format of submissions: The participants of the competition are asked to predict likelihood of the label 'warning' for particular time series from the test set and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 5,076 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly three real numbers corresponding to the target methane meter sensors, separated by a comma. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of the label 'warning'.

Evaluation of resultsThe submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a random subset of the test set, fixed for all participants. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at IJCRS'15 conference.

The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure. It will be computed separately for each of the three target sensors. The final score in the competition will correspond to the average AUC for those three sets of predictions. Namely, if for a submitted solution \(s\) we denote by: $$ \begin{array}{ccl} AUC_{MM263}(s) & - & \textrm{AUC of predictions for the sensor MM263}, \\ AUC_{MM264}(s) & - & \textrm{AUC of predictions for the sensor MM264}, \\ AUC_{MM256}(s) & - & \textrm{AUC of predictions for the sensor MM256}, \end{array} $$ then the final score in the competition for a solution s will be computed as: \[score(s) = \left(AUC_{MM263}(s) + AUC_{MM264}(s) + AUC_{MM256}(s)\right)/3\hspace{0.2cm}.\]

The baseline solution: We prepared an exemplary solution as a reference for participants. It is displayed on the leaderboard as the baseline_solution score. This solution was obtained using two popular algorithms which derive from the theory of rough sets. Namely, a discretization method based on maximum discernibility heuristic [3] was used in a combination with LEM2 algorithm [4] for decision rule induction. Both algorithms were implemented in RoughSets package for R System [5].

  • April 13, 2015: start of the competition, data sets become available,
  • June 20, 2015: deadline for submitting the predictions,
  • June 25, 2015: deadline for sending the reports, end of the challenge,
  • June 29, 2015: on-line publication of final results, sending invitations for submitting short papers for the special session at IJCRS'15,
  • July 12, 2015: deadline for submissions of papers describing the selected solutions,
  • July 19, 2015: deadline for submissions of camera-ready papers selected for presentation at the IJCRS'15.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes funded by our sponsors:

  • First Prize: 1000 USD + one free IJCRS'15 conference registration,
  • Second Prize: 500 USD + one free IJCRS'15 conference registration,
  • Third Prize: one free IJCRS'15 conference registration.

The award ceremony will take place during the IJCRS'15 conference (Oct 20-23, 2015, Jeju Island, Korea).

Andrzej Janusz, University of Warsaw

Marek Sikora, Silesian University of Technology

Łukasz Wróbel, Institute of Innovative Technologies EMAG

Sebastian Stawicki, University of Warsaw

Marek Grzegorowski, University of Warsaw

Dominik Ślęzak, University of Warsaw & Infobright Inc.

  Discussion Author Replies Last post
datasets 芳雪 0 by 芳雪
Tuesday, November 10, 2020, 06:02:42
datasets Guitong 0 by Guitong
Wednesday, September 02, 2020, 05:00:03
The deadline for submitting competition reports has been postponed! Andrzej 0 by Andrzej
Tuesday, June 23, 2015, 21:02:56
The last few days of IJCRS’15 Data Challenge Andrzej 0 by Andrzej
Monday, June 15, 2015, 12:49:41
Limit of 100 submissions. Adam 2 by Andrzej
Monday, May 18, 2015, 10:55:55
Welcome to IJCRS'15 Data Challenge Andrzej 0 by Andrzej
Monday, April 13, 2015, 11:00:20