9 years, 10 months ago
PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data
PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data is our first competition organized within the frame of The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). We would like to challenge participants with a task of devising effective algorithms for recognizing a gender of e-store clients. A data set for the competition was provided by FPT Group which is also the main sponsor of the awards.
FPT has always been a leading information and communication technology enterprise in Vietnam. By 2014, its revenue was about 1.65 billion US dollars, creating more than 22 thousand full-time jobs for the society. FPT has operations in 17 countries including Vietnam, Laos, Cambodia, America, Japan, Singapore, Germany, Myanmar, France, Malaysia, Australia, Thailand, United Kingdom, Philippines, Kuwait, Bangladesh and Indonesia. The main businesses are Software Development, System Integration, Information Technology Services, Distribution and Manufacturing of Information Technology products and Retails, Internet Services Providing and Data Center Services, Online News and Advertising, e-Commerce, Educational Services, Financing Services.
In e-Commerce, FPT runs several B2B2C (business-to-business-to-customer) services that provide online shopping sites and mobile applications for small and medium sellers. Transaction data, such as product browsing and purchasing activities, from buyer, and product portfolio, from seller, can be aggregated, to provide more efficient buying and selling experiences. For example, statistical machine learning techniques can be applied to predict the optimal organization and display of products that maximize the chance of bringing useful information to user, facilitate the online purchases. Perhaps, one of the vital insights, especially for fashion-related products, is the understanding of the relevancy of product to a gender of the user. In PAKDD'15 Data Mining Competition we would like to address this particular problem. More details regarding the task and a description of the competition data can be found in Task Description section.
In case of any questions please post on the forum or write us an email: son@mimuw.edu.pl
PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data has ended after over a month of intense rivalry!
The competition attracted 330 teams from which 149 participated actively by submitting at least one solution to the leaderboard. A total number of submissions was nearly 3000. From the active teams, 28 provided us a brief report describing their approach.
- The winner: Team FRDC,
Members: Ruiyu Fang (the leader), Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen,
Affiliation: Fujitsu Research and Development Center, Beijing, China - The runner up: Team newolfy,
Members: Yingju Xia (the leader), Shuangyong Song, Qingliang Miao, Zhongquang Zheng
Affiliation: Fujitsu Research and Development Center, Beijing, China
Invitations to oral presentation at PAKDD-2015 Contest Workshop:
- Team FRDC, Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen (PAKDD'15 presentation)
- Team newolfy: Yingju Xia, Shuangyong Song, Qingliang Miao, Zhongquang Zheng (PAKDD'15 presentation)
- Team ngocan211: Pham Ngoc An, FPT University, Hanoi, Viet Nam (PAKDD'15 presentation)
- Team ws: Wojciech Świeboda, University of Warsaw, Poland (PAKDD'15 presentation)
- Team sohrab: Mohammad Golam Sohrab, Makoto Miwa, Yutaka Sasaki, Toyota Technological Institute, Japan (PAKDD'15 presentation)
- Team kimiyoung: Zhilin Yang, Yutao Zhang, Jie Tang, Tsinghua University, Beijing, China
- Team ibayer: Immanuel Bayer, University of Konstanz, Germany (PAKDD'15 presentation)
- Team gambi: Maria Brbic, Dragan Gamberger, Matej Mihelcic, Matija Piskorec, Tomislav Smuc, Rudjer Boskovic Institute, Zagreb, Croatia (PAKDD'15 presentation)
I would like to thank all participants for their hard work. Congratulations on your excellent results!
In order to stimulate future research in the topic of the competition we revealed all competition data (including the labels for test cases) in the data files folder.
- The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
- Participants may submit solutions as teams made up of one or more persons.
- Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team.
- One person may be incorporated in maximally 3 teams.
- Each team needs to be composed of a different set of persons.
- The total number of submissions for any single team is limited to 100 solutions.
- A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
- Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by May 4, 2015. Only submissions made by teams that provided the reports will qualify for the final evaluation.
- By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.
In case of questions related to the competition please contact us via email: webmaster@knowledgepit.fedcsis.org or through the competition forum.
Data format: The data for participants were divided into separate training and test sets - trainingData.csv and testData.csv, respectively. Each of these files contains 15,000 records which correspond to product viewing logs. A single log is composed of four columns, separated by commas. The first one is a session ID. The second and the third column correspond to a session start time and session end time, respectively. The last column contains a list of product IDs which were viewed during the session, (the order of viewing is preserved). Consecutive product IDs are separated by semicolons. There is also available trainingLabels.csv file which contains labels identifying true gender of users whose sessions are described in the training data set.
Since a distribution of unique product IDs in the data is very sparse, the IDs contain additional information regarding product category hierarchy. Each product ID can be decomposed into four different IDs which are separated by slashes. The IDs starting with ‘A’ letter are the most general categories and those starting with ‘D’ correspond to individual products. The IDs which start with ‘B’ and ‘C’ are associated with subcategories and sub-subcategories, respectively.
Format of submissions: The participants of the competition are asked to predict the gender of users from the test data and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 15,000 lines. The format of submitted files should follow the format of trainingLabels.csv. In the consecutive lines, this file should contain a single label which identifies the gender of a user who generated the corresponding session log in the test set.
Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session at the PAKDD'15 conference (http://www.pakdd2015.jvn.edu.vn/).
Since the distribution of labels in the data is not balanced, the assessment of solutions will be done using the balanced accuracy measure which is defined as an average accuracy within the decision classes. Namely, for a vector of predictions preds and a vector of true gender labels genders we define the balance accuracy as: \[ACC_{m}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = male|}{|j : genders_{j} = male|}\] \[ACC_{f}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = female|}{|j : genders_{j} = female|}\] \[BAC(preds, genders) = \left(ACC_{f}(preds, genders) + ACC_{m}(preds, genders)\right)/2\]
Rank | Team Name | Is Report | Preliminary Score | Final Score | Submissions | |
1 | ngocan211 |
True | True | 0.8878 | 0.890432 | 2 |
2 | frdc |
True | True | 0.8875 | 0.878928 | 2 |
3 | newolfy |
True | True | 0.8874 | 0.877889 | 2 |
4 | ws |
True | True | 0.8614 | 0.851119 | 2 |
5 | sohrab |
True | True | 0.8546 | 0.851032 | 2 |
6 | kimiyoung |
True | True | 0.8508 | 0.849046 | 2 |
7 | dymitrruta |
True | True | 0.8474 | 0.847979 | 2 |
8 | ibayer |
True | True | 0.8415 | 0.840673 | 2 |
9 | songshuangyong |
True | True | 0.8534 | 0.830603 | 2 |
10 | stderr |
True | True | 0.8170 | 0.811541 | 2 |
11 | amy |
True | True | 0.8183 | 0.810673 | 2 |
12 | gambi |
True | True | 0.8235 | 0.810181 | 2 |
13 | siyu |
True | True | 0.8217 | 0.805283 | 2 |
14 | ahwangyuwei |
True | True | 0.8111 | 0.801450 | 2 |
15 | kkurach |
True | True | 0.8053 | 0.798821 | 2 |
16 | duongtranduc |
True | True | 0.8100 | 0.797838 | 2 |
17 | hnt |
True | True | 0.7949 | 0.797127 | 2 |
18 | vinhn |
True | True | 0.7975 | 0.795656 | 2 |
19 | zhu_ark |
True | True | 0.7948 | 0.791138 | 2 |
20 | mrw19138 |
True | True | 0.7947 | 0.790483 | 2 |
21 | xiexingbo |
True | True | 0.7894 | 0.783441 | 2 |
22 | pdoviet |
True | True | 0.7894 | 0.783204 | 2 |
23 | jankralj |
True | True | 0.7849 | 0.780010 | 2 |
24 | zhijin |
True | True | 0.7787 | 0.772296 | 2 |
25 | ikttan |
True | True | 0.7824 | 0.771237 | 2 |
26 | ds_tcy |
True | True | 0.7688 | 0.770413 | 2 |
27 | thetuxedo |
True | True | 0.7764 | 0.766886 | 2 |
28 | langochai |
True | True | 0.7624 | 0.765252 | 2 |
29 | neozhangthe1 |
False | True | 0.8530 | No report file found or report rejected. | 2 |
30 | linegroup |
False | True | 0.8358 | No report file found or report rejected. | 2 |
31 | zm |
False | True | 0.8216 | No report file found or report rejected. | 2 |
32 | antry |
False | True | 0.8163 | No report file found or report rejected. | 2 |
33 | mamy |
False | True | 0.8137 | No report file found or report rejected. | 2 |
34 | tund |
False | True | 0.8136 | No report file found or report rejected. | 2 |
35 | ggvspp |
False | True | 0.8135 | No report file found or report rejected. | 2 |
36 | khuongnd |
False | True | 0.8130 | No report file found or report rejected. | 2 |
37 | rjo2909 |
False | True | 0.8115 | No report file found or report rejected. | 2 |
38 | wyw |
False | True | 0.8114 | No report file found or report rejected. | 2 |
39 | yanghaisong |
False | True | 0.8104 | No report file found or report rejected. | 2 |
40 | thaidang |
False | True | 0.8091 | No report file found or report rejected. | 2 |
41 | thaidt |
False | True | 0.8085 | No report file found or report rejected. | 2 |
42 | ttd |
False | True | 0.8084 | No report file found or report rejected. | 2 |
43 | orange |
False | True | 0.8046 | No report file found or report rejected. | 2 |
44 | ziom |
False | True | 0.8040 | No report file found or report rejected. | 2 |
45 | xspring |
False | True | 0.8066 | No report file found or report rejected. | 2 |
46 | dirichlet.process |
False | True | 0.8020 | No report file found or report rejected. | 2 |
47 | nhatuan |
False | True | 0.8008 | No report file found or report rejected. | 2 |
48 | ramesh.krnt |
False | True | 0.7995 | No report file found or report rejected. | 2 |
49 | ntienvu |
False | True | 0.8013 | No report file found or report rejected. | 2 |
50 | cwhuang |
False | True | 0.7980 | No report file found or report rejected. | 2 |
51 | fancyspeed |
False | True | 0.7966 | No report file found or report rejected. | 2 |
52 | zhangzhongxia |
False | True | 0.7992 | No report file found or report rejected. | 2 |
53 | derivedbydata |
False | True | 0.7959 | No report file found or report rejected. | 2 |
54 | hnguyen |
False | True | 0.7957 | No report file found or report rejected. | 2 |
55 | dat_phuoc |
False | True | 0.7952 | No report file found or report rejected. | 2 |
56 | lihang00 |
False | True | 0.7933 | No report file found or report rejected. | 2 |
57 | 0000 |
False | True | 0.7930 | No report file found or report rejected. | 2 |
58 | yzan |
False | True | 0.7920 | No report file found or report rejected. | 2 |
59 | wttool |
False | True | 0.7919 | No report file found or report rejected. | 2 |
60 | sasnzy |
False | True | 0.7901 | No report file found or report rejected. | 2 |
61 | eric |
False | True | 0.7887 | No report file found or report rejected. | 2 |
62 | lab213 |
False | True | 0.7868 | No report file found or report rejected. | 2 |
63 | zhili |
False | True | 0.7858 | No report file found or report rejected. | 2 |
64 | huuphuc2609 |
False | True | 0.7856 | No report file found or report rejected. | 2 |
65 | muye5 |
False | True | 0.7837 | No report file found or report rejected. | 2 |
66 | whatif |
False | True | 0.7879 | No report file found or report rejected. | 2 |
67 | pth1993 |
False | True | 0.7816 | No report file found or report rejected. | 2 |
68 | sslim |
False | True | 0.7813 | No report file found or report rejected. | 2 |
69 | iran-amin |
False | True | 0.7801 | No report file found or report rejected. | 2 |
70 | pikachust8811 |
False | True | 0.7801 | No report file found or report rejected. | 2 |
71 | binh |
False | True | 0.7794 | No report file found or report rejected. | 2 |
72 | huong2 |
False | True | 0.7790 | No report file found or report rejected. | 2 |
73 | saj |
False | True | 0.7790 | No report file found or report rejected. | 2 |
74 | etw |
False | True | 0.7788 | No report file found or report rejected. | 2 |
75 | bati |
False | True | 0.7787 | No report file found or report rejected. | 2 |
76 | ketanatnmims |
False | True | 0.7824 | No report file found or report rejected. | 2 |
77 | p2trieu |
False | True | 0.7778 | No report file found or report rejected. | 2 |
78 | yin520liang |
False | True | 0.7773 | No report file found or report rejected. | 2 |
79 | tchang |
False | True | 0.7773 | No report file found or report rejected. | 2 |
80 | baseline_solution |
False | True | 0.7755 | No report file found or report rejected. | 2 |
81 | neurons |
False | True | 0.7751 | No report file found or report rejected. | 2 |
82 | vutm |
False | True | 0.7743 | No report file found or report rejected. | 2 |
83 | spiritoflinz |
False | True | 0.8094 | No report file found or report rejected. | 2 |
84 | qjt |
False | True | 0.7741 | No report file found or report rejected. | 2 |
85 | abhgh |
False | True | 0.7732 | No report file found or report rejected. | 2 |
86 | wst_casd |
False | True | 0.7725 | No report file found or report rejected. | 2 |
87 | h3p |
False | True | 0.7757 | No report file found or report rejected. | 2 |
88 | billguess |
False | True | 0.7951 | No report file found or report rejected. | 2 |
89 | xuxiaofeng |
False | True | 0.7710 | No report file found or report rejected. | 2 |
90 | violet_zct |
False | True | 0.7765 | No report file found or report rejected. | 2 |
91 | marcb |
False | True | 0.7693 | No report file found or report rejected. | 2 |
92 | nightfury |
False | True | 0.7853 | No report file found or report rejected. | 2 |
93 | d10207305 |
False | True | 0.7688 | No report file found or report rejected. | 2 |
94 | fajri91 |
False | True | 0.7683 | No report file found or report rejected. | 2 |
95 | mautoan11 |
False | True | 0.7710 | No report file found or report rejected. | 2 |
96 | jyclin |
False | True | 0.7672 | No report file found or report rejected. | 2 |
97 | tidom |
False | True | 0.7657 | No report file found or report rejected. | 2 |
98 | little_number |
False | True | 0.7707 | No report file found or report rejected. | 2 |
99 | vpodpecan |
False | True | 0.7644 | No report file found or report rejected. | 2 |
100 | tuandinh |
False | True | 0.8049 | No report file found or report rejected. | 2 |
101 | gentaiscool |
False | True | 0.7774 | No report file found or report rejected. | 2 |
102 | bruincui |
False | True | 0.7629 | No report file found or report rejected. | 2 |
103 | starryc |
False | True | 0.7603 | No report file found or report rejected. | 2 |
104 | jackson13 |
False | True | 0.7602 | No report file found or report rejected. | 2 |
105 | hamylinh |
False | True | 0.7591 | No report file found or report rejected. | 2 |
106 | nampham |
False | True | 0.7573 | No report file found or report rejected. | 2 |
107 | hoangphan |
False | True | 0.7653 | No report file found or report rejected. | 2 |
108 | helen |
False | True | 0.7662 | No report file found or report rejected. | 2 |
109 | kuanhoong |
False | True | 0.7524 | No report file found or report rejected. | 2 |
110 | seeyouhere |
False | True | 0.7487 | No report file found or report rejected. | 2 |
111 | pin |
False | True | 0.7455 | No report file found or report rejected. | 2 |
112 | gmustafa |
False | True | 0.7545 | No report file found or report rejected. | 2 |
113 | mayankkejriwal |
False | True | 0.7370 | No report file found or report rejected. | 2 |
114 | mrboring |
False | True | 0.7365 | No report file found or report rejected. | 2 |
115 | lingdian618 |
False | True | 0.7328 | No report file found or report rejected. | 2 |
116 | rembern |
False | True | 0.7614 | No report file found or report rejected. | 2 |
117 | strnam |
False | True | 0.7313 | No report file found or report rejected. | 2 |
118 | zagorecki |
False | True | 0.7844 | No report file found or report rejected. | 2 |
119 | sudarsun |
False | True | 0.7175 | No report file found or report rejected. | 2 |
120 | zgbdsg |
False | True | 0.7846 | No report file found or report rejected. | 2 |
121 | sg.qq |
False | True | 0.7090 | No report file found or report rejected. | 2 |
122 | sdx0112 |
False | True | 0.7089 | No report file found or report rejected. | 2 |
123 | test |
False | True | 0.7044 | No report file found or report rejected. | 2 |
124 | blah |
False | True | 0.7716 | No report file found or report rejected. | 2 |
125 | wilson891226 |
False | True | 0.7897 | No report file found or report rejected. | 2 |
126 | exploit |
False | True | 0.6976 | No report file found or report rejected. | 2 |
127 | janezkranjc |
False | True | 0.6948 | No report file found or report rejected. | 2 |
128 | chaitu516 |
False | True | 0.6868 | No report file found or report rejected. | 2 |
129 | fengqi |
False | True | 0.7112 | No report file found or report rejected. | 2 |
130 | deagle9413 |
False | True | 0.6796 | No report file found or report rejected. | 2 |
131 | khanhlh |
False | True | 0.6660 | No report file found or report rejected. | 2 |
132 | sathik |
False | True | 0.6620 | No report file found or report rejected. | 2 |
133 | wolfinlove |
False | True | 0.6615 | No report file found or report rejected. | 2 |
134 | huongtt |
False | True | 0.6691 | No report file found or report rejected. | 2 |
135 | f-ken1010 |
False | True | 0.6555 | No report file found or report rejected. | 2 |
136 | hnt1 |
False | True | 0.6796 | No report file found or report rejected. | 2 |
137 | nghiemduc |
False | True | 0.7085 | No report file found or report rejected. | 2 |
138 | statgeek |
False | True | 0.7500 | No report file found or report rejected. | 2 |
139 | franksnail |
False | True | 0.6159 | No report file found or report rejected. | 2 |
140 | meerkat |
False | True | 0.5641 | No report file found or report rejected. | 2 |
141 | oahcil |
False | True | 0.5633 | No report file found or report rejected. | 2 |
142 | huwenp |
False | True | 0.5577 | No report file found or report rejected. | 2 |
143 | linjie_zhu |
False | True | 0.5565 | No report file found or report rejected. | 2 |
144 | abhijit |
False | True | 0.5473 | No report file found or report rejected. | 2 |
145 | ssssqd |
False | True | 0.5085 | No report file found or report rejected. | 2 |
146 | kloud |
False | True | 0.5058 | No report file found or report rejected. | 2 |
147 | yupbank |
False | True | 0.5006 | No report file found or report rejected. | 2 |
148 | thnhu |
False | True | 0.7996 | No report file found or report rejected. | 2 |
149 | customs |
False | True | 0.6881 | No report file found or report rejected. | 2 |
150 | cllab |
False | True | 0.7930 | No report file found or report rejected. | 2 |
151 | sink |
False | True | 0.5969 | No report file found or report rejected. | 2 |
152 | mathimohanraj |
False | True | 0.0000 | No report file found or report rejected. | 2 |
153 | clapika2010 |
False | True | 0.7870 | No report file found or report rejected. | 2 |
154 | pg7799 |
False | True | 0.7766 | No report file found or report rejected. | 2 |
- March 23, 2015: start of the competition, data sets become available,
- May 1, 2015: deadline for submitting the predictions,
- May 3, 2015: deadline for sending the reports, end of the challenge,
- May 19, 2015: beginning of the PAKDD'15 conference, official announcement of the winners.
Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes sponsored by FPT Group:
- First Prize: Apple Mac Book Air + one free PAKDD'15 conference registration,
- Second Prize: new FPT smartphone (to be defined) + one free PAKDD'15 conference registration.
The award ceremony will take place during the PAKDD'15 conference (May 19 - 22, Ho Chi Minh City, Vietnam).
Hung Son Nguyen, University of Warsaw
Tran The Trung, FPT University
Tu Bao Ho, Japan Advanced Institute of Science and Technology
Duc Dung Nguyen, Vietnam Academy of Science and Technology
Andrzej Janusz, University of Warsaw
Discussion | Author | Replies | Last post | |
whether top10 reports are available sometime? | muye5 | 2 | by Andrzej Thursday, May 21, 2015, 09:42:59 |
Complete test data: Gender Prediction Based on E-commerce Data | Ramesh | 1 | by Andrzej Thursday, May 21, 2015, 09:52:19 |
Postpone the deadline | Sajjad | 1 | by Sajjad Saturday, May 02, 2015, 00:53:18 |
The last week of PAKDD'15 Data Mining Competition | Andrzej | 0 | by Andrzej Monday, April 27, 2015, 13:06:02 |
Training data VS Test data | Amin | 1 | by Amin Sunday, April 26, 2015, 01:05:08 |
New competition at Knowledge Pit: IJCRS’15 Data Challenge | Andrzej | 0 | by Andrzej Monday, April 13, 2015, 11:22:57 |
A problem with KnowledgePit server | Andrzej | 0 | by Andrzej Sunday, April 12, 2015, 14:58:27 |
Final submission | Tu | 2 | by Andrzej Wednesday, April 01, 2015, 09:17:09 |
small inconsistency in test data | Vid | 2 | by Andrzej Monday, March 30, 2015, 10:50:10 |
Conference participation | Eftim | 1 | by Andrzej Monday, March 30, 2015, 14:03:01 |
AAIA'15 Data Mining Competition: Activity Recognition Based on Body Sensor Networks | Andrzej | 0 | by Andrzej Friday, March 27, 2015, 17:18:35 |
Multiple Participants in single team? | Abhay | 2 | by Abhay Monday, March 30, 2015, 19:36:20 |