10 months, 3 weeks from now
Recruitment Challenge @ QED Software
This is an internal data mining challenge at QED Software, aiming at checking ML skills of new employees.
At QED Software, we use the KnowledgePit platform to challenge the data science community members to solve real business problems and test their skills, knowledge, and the most important - creative thinking. This challenge is not a competition but, first and foremost, a place for self-evaluation. If you have been invited to take part in it, you are likely a motivated data scientist. But what can you do with what we have prepared? Enter the challenge and check your skills!
What to know more about what is ahead of you? The task is to detect truly suspicious events and false alarms within network traffic alert data that the Security Operations Center (SOC) Team members have to analyse daily. An efficient classification model should help the SOC Team to optimise their operations significantly. Technical details are in the Task Description section.
We wish you good luck and satisfaction with your solution(s)!
This challenge is based on a data mining competition (Suspicious Network Event Recognition) organized in association with IEEE BigData 2019 conference. Check the original competition here. You may find there some reference results.
Challenge Participation Rules:
The Invitation e-mail you received contains your Schedule specifying your submission time limits and Organizer’s response times. Please check your deadlines and stick to them.
- The challenge is organized by QED Software Sp. z o. o. (the Organizer) and is not open to the public.
- Participants must be of legal age of majority (in Poland it is 18 years of age).
- The challenge is open exclusively to individuals invited at the sole discretion of the Organizer.
- Invitation to Challenge is sent by email and contains a Challenge schedule (timing and deadlines).
- To take part in the Challenge, each Participant must register to the KnowledgePit platform (available at knowledgepit.ai or knowledgepit.ml).
- A single individual may enroll in the challenge with only one KnowledgePit account.
- The Participant is treated by the KnowledgePit platform as a Team of just 1 person (him/herself - so don’t be alarmed, you are yourself your own Team, and its Team Leader).
- The Participant is expected to solve the task individually and submit his/her own solution through the Submissions tab on the Challenge site. The Participant may submit up to 10 solutions within the submission deadline. The last sent solution will be treated as the final solution.
- Each Participant is obliged to provide the source code of the final solution and a short report (up to 2000 characters) describing this solution. The report must contain a brief overview of the used approach. The description should explain all data pre-processing steps and model construction steps. It should be submitted in the pdf format using the KnowledgePit submission tab by the deadline specified in the Invitation email.
- The duration of the challenge is set by the Organizer and announced to each Participant in the invitation email sent to him/her. Each participant has no less than 2 weeks to submit their solutions and 1 week to submit the code and report (description). The Organizer will provide feedback to the Participant within 2 weeks from the report submission deadline (note that if you submit your work earlier than the deadline specified, it will not have any bearing on your results and will not obligate the Organiser to give you your feedback earlier than scheduled.).
- Organizers hold the right to extend the deadlines for submitting solutions and/or reports. In such a case, they will inform participants about the change using email.
- The Challenge is organized for the purpose of skill-testing and education. The participation is fully voluntary and the Organizer will not select any winners.
- There is no prize in this Challenge.
- Under no circumstances will the entry in this challenge, place in the ranking, or anything in these Rules be construed as an offer or contract of employment with the Organizer. You acknowledge that you have submitted your solutions voluntarily. You acknowledge that no confidential, fiduciary, agency, employment or other similar relationship is created between you and the Organizer by your acceptance of these Rules or your entry of your solution.
- The Leaderboard is a table viewable in the Leaderboard tab of the Challenge website that shows the best solutions submitted in all editions of the challenge for the specified task, ranked according to their initial evaluation.
- The final ranking of the Participants will be done based on the final evaluation results. In the case of draws in the evaluation scores, the time of the submission will be taken into account.
- The Participant can see how his/her final solution is ranked vis-à-vis other solutions, and there is no prize, penalty, credit or other forms of valuation resulting from the position in the ranking.
- Organizers are not responsible for any consequences of technical issues related to the participant’s ability to access the KnowledgePit platform or submit their solution, including without limitation for issues relating to internet connectivity. If due to internal technical problem with the KnowledgePit platform, the resulting delays or temporary unavailability of the platform it likely to impact the participant’s ability to timely submit his/her solution, the challenge deadline will be extended by the Organizer with the time of such unavailability and participants will be informed of such extension by email.
- Each report, paper, and any other type of publication based on the participant’s research where data from this challenge is used should be consulted with QED Software Sp. z o.o. before publication in order to clear rights, references, and notices regarding both KnowledgePit, QED Software, and Security On-Demand as the institutions that provided data for the study.
- Organizer may at its discretion reject any submission or disqualify any participant if the Organizer reasonably beliefs that it was – respectively - produced in an unfair or illegitimate way or submitted by a person who was cheating, using deception, or otherwise has broken the challenge rules without providing any additional explanation.
- If you think you were excluded without reason, or have any questions, please contact the Organizer at firstname.lastname@example.org.
- By enrolling in this competition, the Participant grants his/her consent for the processing of her/his registration data and the submissions and reports, and grant the Organizer the rights to use such data and submissions for the purpose of evaluation.
In this challenge, the task is to detect truly suspicious events and false alarms within the set of so-called network traffic alerts that the Security Operations Center (SOC) Team members have to analyze daily. An efficient classification model could help the SOC Team to optimize their operations significantly. This data set comes from IEEE BigData 2019 Cup: Suspicious Network Event Recognition challenge.
The data set available in the challenge consist of alerts investigated by a SOC team at the Security on Demand company (SoD). We call such signals 'investigated'. Each record is described by various statistics selected based on experts' knowledge and a hierarchy of associated IP addresses (anonymized), called assets. For each alert in the 'investigated alerts' data tables, there is a history of related log events (a detailed set of network operations acquired by SoD, anonymized to ensure the safety of SoD clients).
The data sets cover half a year between October 1, 2018, and March 31, 2019. You can find the description of columns from the 'investigated alerts' data in a separate file called column_descriptions.txt. We divided the main data into a training set and a test set based on alert timestamps. The training set (the file cybersecurity_training.csv) utilizes approximately four months, and the remaining part constitutes a test set (the file cybersecurity_test.csv). The format of those two files is the same - columns are separated by the vertical line '|' sign. However, the target column called 'notified' is missing in the test data.
The task and the format of submissions: the job is to predict which of the investigated alerts were considered truly suspicious by the SOC team and led to issuing a notification to SoD's clients. In the training data, this information is indicated by the column 'notified'. A submission should have a form of scores assigned to every record from the test data - each score in a separate line of a text file. You can find an example of a correctly formatted submission file in the Data files section.
Evaluation: we evaluate the quality of submissions using the AUC measure. The assessment is automatic. The preliminary results are published in the Submission section. The preliminary results are evaluated on a representatively selected subset (10%) of the test data, however, only the score obtained on the remaining 90% of test data is taken into account in the final assessment.
You may find it useful to explore the publications linked to the original competition:
A. Janusz, D. Kałuza, A. Chadzynska-Krasowska, B. Konarsk, J. Holland, D. Slezak: IEEE BigData 2019 Cup: Suspicious Network Event Recognition. BigData 2019.
Q. H. Vu, D. Ruta, and L. Cen, “Gradient Boosting Decision Trees for Cyber Security Threats Detection Based on Network Events Logs,” in 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, December 9-12, 2019, 2019.
C. Dongy, Y. Chen, Y. Zhang, B. Jiang, S. Liu, D. Han, and B. Liu, “An Approach For Scale Suspicious Network Events Detection,” in 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, December 9-12, 2019, 2019.
T. Wang, C. Zhang, Z. Lu, D. Du, and Y. Han, “Identifying Truly Suspicious Events and False Alarms Based on Alert Graph,” in 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, December 9-12, 2019, 2019.