2 years, 5 months ago
Second semester project for the Interactive Machine Learning 2021/2022 course
This is the second semester project for the Interactive Machine Learning 2021/2022 course. The task is to choose optimal first batch of queries for training a simple classification model.
The goal of this competition is to choose three subsets of samples from the data pool such that they compose the best possible initial data batch for active learning of a simple prediction model (logistic regression).
Competition rules are given in Terms and Conditions.
The description of the task, data, and evaluation metric is in the Task description section.
Rank | Team Name | Is Report | Preliminary Score | Final Score | Submissions | |
---|---|---|---|---|---|---|
1 | anythinggoesinmorocco |
True | True | 0.4320 | 0.425300 | 13 |
2 | I_got_lost_in_random_forest |
True | True | 0.4188 | 0.407400 | 76 |
3 | Michał Siennicki |
True | True | 0.3999 | 0.388900 | 21 |
4 | Paweł |
True | True | 0.3834 | 0.377500 | 42 |
5 | wolololo |
True | True | 0.3741 | 0.372100 | 10 |
6 | Wielki Szu |
True | True | 0.3675 | 0.364500 | 15 |
7 | tbd |
True | True | 0.3670 | 0.357400 | 6 |
8 | Tieru |
True | True | 0.3760 | 0.357300 | 50 |
9 | Długopis |
True | True | 0.3639 | 0.355200 | 34 |
10 | baseline |
True | True | 0.3531 | 0.351200 | 4 |
11 | ksztenderski |
True | True | 0.3526 | 0.349100 | 31 |
12 | mrgr |
True | True | 0.3634 | 0.348500 | 50 |
13 | Maciej Pióro |
True | True | 0.3419 | 0.341100 | 6 |
14 | aleksandra |
True | True | 0.3327 | 0.341100 | 13 |
15 | mpacek |
True | True | 0.3540 | 0.325800 | 27 |
16 | krzywicki_piotr |
True | True | 0.2681 | 0.259900 | 9 |
The task in this project is to choose three subsets of samples from the data pool such that they allow constructing the most accurate logistic regression model. The size of subsets should be 100, 200, 500, respectively.
The data pool is given as a single csv file. In particular, each line of the file corresponds to a description (a feature vector) characterizing approximately 1.8 seconds of readings of inertial sensors placed on seven different body parts of firefighters during exercises. In total, the data pool has 20000 samples described by 1680 features. The labels for this data are different classes of actions that were conducted by firefighters.
Each sample can be assigned to one of 16 classes: "ladder_going_down" , "ladder_going_up", "manipulating", "no_action", "nozzle_usage", "running", "searching", "signal_hose_pullback", "signal_water_first", "signal_water_main", "signal_water_stop", "stairs_going_down", "stairs_going_up", "striking", "throwing_hose", "walking".
No labels for the training data are available.
Format of submissions: solutions should be submitted as text files with three lines. The first line should contain exactly 100 integers - the indices of samples from the data pool (samples are indexed starting from 1), separated by commas. The second and the third line should contain analogous indices for the second and the third set of samples, with sizes 200, and 500, respectively.
Evaluation: the evaluation of submitted solutions will be done using a LASSO logistic regression model, trained independently on the three sets of samples added to the initial data which was made batch available to participants. Each model will be evaluated on a separate test set (hidden from participants). The quality metric used for the evaluation will be the average BAC. From each score, we will compute balanced accuracies of models trained only on the selected initial data batches and the three results will be averaged with weights 5, 2.5, and 1 for the subset sizes 100, 200, and 500, respectively.
During the challenge, your solutions will be evaluated on a small fraction of the test set (10%), and your best preliminary score will be displayed on the public Leaderboard. After the end of the competition, the selected solutions will be evaluated on the remaining part of the test data and this result will be used for the evaluation of the project.
The LASSO logistic regression model will be trained with the regularization parameter lambda set to 0.01.