Re: Coal mine structure and additional data
by janek - Thursday, January 07, 2016, 11:23:30

Dear Dymitr,

Unfortunately, we currently do not have any data regarding exact distances between the main working sites from the data. However, the region name id and the seam name id, which are available in the mata-data table, can be used to roughly aggregate the working sites with regard to their proximity.

Regarding the second part of your post, data for 13 (out of 21) main working sites which appear in the test data are also present in the training data. This corresponds to approximately 70% of test data. From those 13 sites, 9 appear only in the additional training data sets. Moreover, the additional data can be used not only to better train your models, but also to more accurately evaluate their performance. Such evaluation could be far more useful than the one provided by the preliminary score.

Best regards and good luck!
Andrzej Janusz


Re: Coal mine structure and additional data
by dymitrruta - Thursday, January 07, 2016, 22:02:36

Dear Andrzej,

Many thanks for the explanation, I fully agree, however the most eventful sites 264, 373 and 437 that account for 53% of all training data seen so far do not have instances in the preliminary testing - which more than not allows to state that we are testing different sites than presented training data for :). Again this does not have to be a problem but may explain the sources of big discrepancies between different parts of the data in this competition, but I guess such is the reality of this data.

Once again many thanks, Dymitr