DS2022_second

3 years, 5 months ago

Second semester project for Decision Systems Course 2022/2023

This is the second project for students enrolled in the Decision System 2022/2023 course at the Faculty of Mathematics, Informatics, and Mechanics at the University of Warsaw.

Overview

The goal of this competition is to find informative subsets of genes that allow to efficiently solve classification problems defined for a number of microarray data sets.

More detailed competition rules are given in the Terms and Conditions.

The description of the data and evaluation metric is in the Task description section.

The deadline for sending submissions and scores is January 27, 2023.

Terms & Conditions

Participants of the challenge are obliged to follow the competition rules:

This challenge is organized by Andrzej Janusz (the Organizer) for students enrolled in the Decision System 2022/2023 course at the Faculty of Mathematics, Informatics, and Mechanics at the University of Warsaw.
The provided data sets are the property of the Organizer and the KnowledgePit platform. It is forbidden to share or redistribute provided data sets to any third party without explicit consent from the Organizer.
Participants can work individually or as a team consisting of maximally two persons. The teams need to be formed at the beginning of the challenge. Participants cannot change their teams.
Each team has a limited number of submissions - the limit is set to 100.
The number of submissions per day is limited to 10.
Participants can use data that was made available in the challenge - using any external resources is possible only after receiving explicit consent from the Organizer. Queries regarding external resources need to be issued through the competition forum.
It is strictly forbidden to hack the provided data or to exploit any unfair and unintended data leaks that can improve the solution score. All attempts at making predictions for any test instance using information extracted from other test instances will result in disqualification.
The deadline for submitting the solutions is January 27, 2023 (23:59 GMT). Late submissions will not be accepted.
Each team is obliged to provide a short report describing their final solution. The report must contain information such as the name of the team, the names of all team members, and a brief overview of the approach used. The description should explain all data preprocessing steps and model construction steps. It should be submitted in the KnowledgePit submission system by January 27, 2023 (23:59 GMT).
By enrolling in this competition, you grant the Organizer the right to process your submissions and reports for the purpose of evaluation and post-competition research.
The final project score will depend on the quality of the solution (the score obtained in the final evaluation), and on the quality of the submitted report.

Enroll

Please log in to the system!

Task description

The provided data consist of ten microarray sets with a various number of instances and attributes. Microarray data is a typical example of a problem called "few-samples-many-attributes".

The data tables are provided as CSV files with the ',' (coma) separator sign. In each set, the last column is called "target" and contains class labels for samples. The data sets can be downloaded after registration to the competition. You only have access to the training parts of the data sets. Your task is to (for each set) identify the optimal subset of attributes for an SVM classifier with a linear kernel and the cost parameter set to 1. No additional regularization will be used for the model. For each data set, you may indicate between 2 and 102 attributes but a small penalty to your score will be added for each attribute used.

The evaluation metric will be balanced accuracy (BAC) adjusted by a penalty for using many attributes. In particular, your score is the average of BAC - (k-2)/5000, where k is the number of utilized attributes for a given data set.

SVM model used for the evaluation is computed with the code below:

model <- e1071::svm(dt_tr[, feats, with = FALSE], dt_tr[, factor(target)], type = "C-classification", kernel = "linear", cost = 1, gamma = 1/length(feats), scale = TRUE)

During the competition, your solutions will be evaluated on five of the data sets, and your best preliminary score will be displayed on the public Leaderboard. The final score of each team will be computed on the remaining data sets.

The submission format: the solutions need to be submitted as text files with indicated attribute sets. The file should have exactly 10 rows. In each row, it should contain integers between 1 and the number of columns in the corresponding data set. These integers should indicate attributes (column numbers) that should be used by the evaluation model. The ordering of rows should correspond to the numbers indicated in the names of the provided data sets.

The deadline for sending submissions and reports is January 27, 2023.

Data files

Final results

Rank	Team Name	Is Report		Preliminary Score	Final Score	Submissions
1	Drużyna bez fajnego skrótu	True	True	0.7063	0.670000	33
2	CakeTeam	True	True	0.7087	0.667300	10
3	team	True	True	0.6750	0.665400	14
4	baseline	True	True	0.6743	0.658300	2
5	Team name	True	True	0.7297	0.654600	44
6	NWJ	True	True	0.6692	0.649400	14
7	na pewno nie rynek w Zabrzu	True	True	0.7144	0.642900	32
8	Kulka błota	True	True	0.6919	0.638700	4
9	Łukasz	True	True	0.6843	0.638200	10
10	cotopaxi	True	True	0.6694	0.636800	7
11	OS	True	True	0.7220	0.635300	26
12	ff	True	True	0.6940	0.634200	28
13	T	True	True	0.6568	0.632800	23
14	quozz	True	True	0.7193	0.632000	20
15	419328	True	True	0.5928	0.614300	11
16	Kl	True	True	0.6822	0.611500	30
17	drużyna1	True	True	0.6500	0.603900	17
18	teamteam	True	True	0.7011	0.574300	20
19	spinach	False	True	0.5228	No report file found or report rejected.	7

Forum

This forum is for all users to discuss matters related to the competition. Good manners apply!

	Discussion	Author	Replies	Last post
	Kolejność kolumn w rowiązaniu		1	by Andrzej Thursday, January 05, 2023, 08:34:24
	format zgłoszenia	Tomasz	1	by Andrzej Sunday, January 01, 2023, 17:41:34