Privacy-preserving Matching of Images

2 years, 9 months ago

IEEE BigData 2022 Cup: Privacy-preserving Matching of Encrypted Images

IEEE BigData 2022 Cup: Privacy-preserving Matching of Encrypted Images is a data mining competition organized in association with the 2022 IEEE International Conference on Big Data (IEEE BigData 2022, https://bigdataieee.org/BigData2022/index.html). The task is to verify the image anonymization mechanisms of smart monitoring devices developed jointly by MyLED and QED Software as a part of the AraHUB technology (https://arahub.ai/). The challenge is sponsored by MyLED (https://myled.pl/).

Overview

This challenge is associated with the idea of building systems for collecting and analyzing data from the environment. With the use of proprietary devices consisting of a set of optical sensors, it is possible to collect information about people moving nearby, including: movement direction, interest analysis, and demographic profile. Then, with the use of AI/ML algorithms, it is possible to interpret the collected information and provide the profile of an audience within the range of the devices.

The proposed scope of this challenge is closely related to the construction of the next generation of "smart" monitoring devices and systems. Such systems are being developed by MyLED and QED software as a part of the AraHUB initiative (arahub.ai). One of the very important challenges in this project is the preservation of privacy and protection of personal information (e.g., name, gender, age, location) of the persons being monitored. In order to meet the legal requirements, such as those established in the EU's General Data Protection Regulation (GDPR), our monitoring systems (platforms) are intended to use strong encryption and anonymization techniques. These anonymization and privacy-preservation methods have to be robust and safe. The challenge is intended as a verification mechanism for the sufficiency of the level of encryption used at various levels of data processing in the monitoring systems.

Special session at IEEE Big Data 2022: As in previous years, a special session devoted to the competition will be held at the conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The papers will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report.

Terms & Conditions

Competition Participation Rules for "IEEE BigData 2022 Cup: Privacy-preserving Matching of Encrypted Images”

By entering this Competition you accept these official Competition rules.

1. Organizer

The Competition is organized by MyLED (https://myled.pl/), in association with the 2022 IEEE International Conference on Big Data (IEEE BigData 2022, https://bigdataieee.org/BigData2022/index.html) and QED Software sp. z o. o. The Organizer in the meaning of art. 919 and art. 921 of the Polish Civil Code is MyLED (the Organiser), registered at ul. ul. W. Łokietka 14/2, 30-016 Kraków.
The Competition is sponsored by MyLED (the Sponsor)
The Competition is organized via the KnowledgePit platform (available at knowledgepit.ai or knowledgepit.ml) and any submissions made outside of the platform will not be admissible.

2. Entry

The Competition is open to all interested researchers, specialists, and students.
Members of the Contest Organizing Committee (see the dedicated section on the Competition page) and employees of the Organizer, the Sponsor, and their affiliated entities are not allowed to participate.
Persons who are residents of, or are affiliated with or employed by or otherwise contractually or legally tied to an organization, educational institution, company, or other entity of the Russian Federation or another state or territory that falls under the scope of international sanctions or controls for reasons of war, terrorism or otherwise, are not allowed to participate. The current list of such legal restrictions in force in the European Union is available at https://sanctionsmap.eu/#/main. If you have any doubt about whether such restriction may apply to you, the Organiser reserves the right to verify eligibility and to adjudicate your eligibility at any time.
Participants must be of legal age of majority (Check your country’s law, in Poland and in most countries it is 18 years of age).
Registration for the Competition is done via the KnowledgePit platform (knowledgepit.ai or knowlegepit.ml).

3. Timeline

A detailed Competition time schedule is available in the Schedule section (below) of the Competition page.

4. Participants and Teams

A person may enroll in the Competition with only one KnowledgePit user account. Using multiple accounts constitutes grounds for exclusion from the Competition.
Participants submit their solutions as members of teams made up of one or more persons.
Each participant may be a member of only one Team enrolled in the challenge.
Each Team must designate one of the Team Members as the Team Leader responsible for communication with the Organizer.
A single KnowledgePit account can only be associated with one Team in a given competition. It is not possible to withdraw from a Team, but Teams can be merged.
Merging is done between Teams by their respective Team Leaders through the KnowledgePit platform and requires the consent of both Teams.
Participation in the Competition is free and voluntary.
Privately sharing elements of the solution such as code or data outside of Teams is not permitted and may result in disqualification of the persons or Teams involved, however, it is permitted to share remarks and ideas with all participants on the Knowledge Pit Competition Forums.
Participants must use their own resources, particularly software and other necessary tools and equipment needed to prepare a solution to be submitted.

5. Solutions

Participants submit their solutions through the Knowledge Pit platform.
The submitted solution is checked for errors, and if none is found, the solution is immediately included on the public Leaderboard.
There is a strict limit on the total number of solutions that can be submitted by a Team during the Competition and also, on the number of solutions that can be submitted daily.
For this Competition, those limits are set to 400 and 10, respectively.
The daily submission limit resets at 11:59 PM GMT.
When two Teams merge, their solutions accumulate, but the limits for the resulting Team remain unchanged.
If a Team exceeds the limit for the total number of submissions, the Team will be unable to submit new solutions until the Competition ends.
Team can select up to three solutions for the final evaluation, and the best of them will constitute the Team’s final score (Final Solution).
If a Team fails to select their 3 solutions for final evaluation, then this team’s solution with the highest score on the public Leaderboard is automatically selected as the Team’s Final Solution.

6. Evaluation

Each Team is obliged to provide a short Report describing their Final Solution. The Report must contain information such as the name of the Team, the names of all Team Members, and a brief overview of the approach used in the solution. The description should explain all data pre-processing steps and model construction steps. It should be submitted in the KnowledgePit platform submission system by the Report Submission Date specified in the Schedule section of the Competition page (below).
The final evaluation takes place after the expiry of the Report Submission Date.
After the Final Evaluation, three top-ranked Teams will be asked to provide the source codes that can be used to reproduce their Final Solutions and Documentation that would allow running the code. If the code requires to be run within a complex environment (e.g. distributed Hadoop cluster), a detailed setup explanation should be provided as well. The source codes will be used to verify the legitimacy of the solutions and will be shared with the Competition Sponsor in accordance with the provisions of Section 9 below.
For a Team to be eligible for an Award in this Competition, a Team must be one of the three top-ranked teams. Moreover, a Team must provide the report and source codes upon request for the verification of their legitimacy.
In this Competition, the teams are eligible for monetary prizes only if their final solution improves the baseline.
If the Contest Organising Committee refrains from indicating such a solution in its announcement of winners, the additional prize will not be awarded. This additional prize may be awarded to any team, including one of the winners of the main prizes.
The Organizer holds the right to extend the deadlines for submitting solutions and/or reports. In such a case, participants will be informed about such an extension through the KnowledgePit platform Competition Forum.
The Organizer is not responsible for any consequences of technical issues related to the participant’s ability to access or submit to the KnowledgePit platform, especially for issues relating to Internet connectivity. If due to technical problems within the Organizer’s control, the resulting delays or temporary unavailability of the platform or its components are likely to impact the participant’s ability to timely submit the Team’s solution, the challenge deadline will be extended by the Organizer by the time of such unavailability, and participants will be informed of such extension by email.
The final ranking of the competing Teams will be published based on the final evaluation results.
In the case of draws in the evaluation scores, the time of the submission will be taken into account.

7. Prizes

The Prizes are listed in the dedicated section of the Competition page.
Each prize comprises two components: a monetary amount and the ability for a selected Team Member to participate in the IEEE BigData 2022 conference. The Organiser shall cover a maximum of one registration fee per winning team. The registration fee is a prize component that is non-monetary, and cannot be converted into a monetary value.
Each winning Team must select their representative Team Member to participate in the 2022 IEEEE International Conference on Big Data and notify their selection to the Organizer. If no Team Member is selected by his or her Team to participate in the conference, the registration fee shall be deemed forfeited and not constitute part of the prize.
Following the announcement of the winners, The Organiser shall pay the monetary prizes to the winning teams. Each winning team shall notify the Organizer within 21 days following the announcement of how the Team wishes the monetary prize to be paid out (single payment, split payments, etc.).
In order to receive the Prize, each prize recipient is required to submit to the Organizer all documents necessary for the remittance of such prizes, such as their personal detail, including bank account number, their certificate of tax residency issued by the tax authority of their place of residence or other documents required by law. If such necessary documents are not submitted within the time limit indicated in the notification of the award, the monetary prize cannot be paid out and is deemed to have been forfeited by the winner.
The winning participant who fails or refuses to provide the necessary legal documentation such as defined above will be deemed to have forfeited the monetary prize, and such prize will not be paid out. In such a case, the winning Team may select another of its members as the recipient of such a prize.
In accordance with the law, the Organizer shall cover the tax payments required on the awarded prizes at the rates and on terms prescribed by law in the country of the Organizer (Poland).
The Organizer is not responsible in any way for any duties, levies, fees, or taxes that the winning participant may be subject to in his or her country of residence. It is the sole responsibility of the winning participant to inquire at the appropriate authorities about applicable documentation and/or tax payments that she or he may be required to submit.

8. Claims

Submissions are not admissible if they are in whole or part illegible, incomplete, damaged, altered, counterfeit, obtained through fraud, or late, or made or submitted in breach of these Rules.
Participant may be disqualified if the Organizer reasonably believes that the participant has attempted to undermine the legitimate operation of the Competition by using multiple accounts, cheating, deception, or other unfair playing practices or abuses, threats, or harassment towards other participants or the Organizer.
Organizer may at its discretion reject any submission or disqualify any participant if the Organizer reasonably believes that it was – respectively - produced in an unfair or illegitimate way or submitted by a person who has broken the challenge rules without providing any additional explanation.
If you think you were excluded without reason, or have any questions, please contact the Organizer at contact@knowledgepit.ai.

9. Data security, Privacy and Copyright

Each participant agrees to use reasonable and suitable measures to prevent persons who have not formally agreed to these Rules from gaining access to the Competition data.
You agree not to transmit, duplicate, publish, redistribute or otherwise provide or make available the Competition data to any party not participating in the Competition. You agree to notify the Organizer immediately upon learning of any possible unauthorized transmission of or unauthorized access to the Competition data and agree to work with the Organizer to rectify any unauthorized transmission or access.
Each report, paper, and any other type of publication based on the team’s or participant’s research where data from this Competition is used should accredit KnowledgePit, Organiser and Sponsors as the institutions that provided data for the study.
The fact of accepting the award is equivalent to granting to the Organizers and Sponsors a worldwide, non-exclusive, sub-licensable, transferable, royalty-free, perpetual, and irrevocable right to use, reproduce, distribute, create derivative works of, publicly perform, publicly display, digitally perform, make, have made, sell, offer for sale and/or import, the winning solution submitted and the source code used to generate it, in any media now known or hereafter developed, for any purpose whatsoever, commercial or otherwise, without further approval, and without any payment to the participant or participants who authored or co-authored it. By accepting the award the participants also acknowledge that they have full and unrestricted rights to grant the aforementioned rights.
By enrolling in this Competition, the participant grants his/her consent for the processing of her/his registration data and the submissions and reports, and grants the Organizer the rights to use such data and submissions for the purpose of evaluation of solutions, competition administrative purposes and in post-competition research.
By accepting the award the winning participant grants the Organiser the right to process his or her personal data such as name, address, personal identification or security number, bank account or credit card number, and other necessary details provided for the purposes of prize processing and payment, including the payment of applicable taxes.
By accepting the award the participant grants the Organizer the right to use the participant's name, affiliation, and/or prize information for the purpose of informational and promotional purposes of the Competition and the KnowledgePit platform in any medium without additional compensation.
Participant’s data is administered by the Organiser and shall be processed in accordance with the European data protection and privacy rules (The General Data Protection Regulation (EU) 2016/679). For more information check Organizer’s Privacy Policy.

10. Final arrangements

The Organizer reserves the right to modify the Rules of this Competition, including without limitation for the purpose of clarification, correcting obvious editing mistakes, extending the deadlines for the benefit of participants, or other minor amendments. In the event of any change to the Rules, participants will be informed of them via the Competition forum.
Unless otherwise provided in the Competition Rules above, all claims arising out of or relating to these Rules will be governed by Polish law and will be litigated in Poland. Participant consent to personal jurisdiction in those courts.
If any provision of these Rules is held to be invalid or unenforceable, all remaining provisions of the Rules will remain in full force and effect.

Final results

Rank	Team Name	Is Report		Preliminary Score	Final Score	Submissions
1	v	True	True	0.7204	0.694600	188
2	kubapok	True	True	0.6919	0.691500	41
3	Stan	True	True	0.7086	0.689400	244
4	dragon	True	True	0.8035	0.689200	285
5	bottomline	True	True	0.5933	0.593400	12
6	baseline	True	True	0.5593	0.564400	4
7	HBKU AI	True	True	0.5514	0.548200	1
8	AMFAD	True	True	0.5380	0.540900	11
9	ML	True	True	0.5165	0.515300	29
10	amy	True	True	0.5117	0.511600	20
11	xyzxyzxyzxyz	True	True	0.5047	0.498800	1

Task description

The task of the competition is aimed at checking whether the encoding/scrambling techniques that we employ in the new generation of “smart” monitoring systems are sufficiently secure and robust. We provide the challenge participants with a (training) set of examples consisting of the original image (taken from the public domain) and its encoded (scrambled) version. Then, on a separate set of pairs (test/verification set), the participants have to decide (Yes/No) whether the scrambled file contains the visible image or not. If it is possible to train a model that would make such a decision with high accuracy, it will be a proof-by-example that the encoding techniques used need strengthening.

The task is composed of three subtasks S1, S2, and S3 differentiated by the encoding method used and, in the case of S3, the set of images used. Subtasks S1 and S2 correspond to the encryption that takes place inside the monitoring system. This type of data would not be possible to obtain without physical access to the monitoring device. Subtask S3 corresponds to the situation, when the device sends a more computationally laborious task, such as gender or age recognition from a face image, via the Internet to the computational cloud. As this kind of communication can be eavesdropped on, the encryption used must be strengthened.

Subtask S1: Given the pair of images, one original and one after encoding, determine (Yes/No - 0/1) whether they are the same or not. The encoded image is a result of applying an image obfuscation scheme based on the chaotic system theory. In these algorithms, the original image is an argument for parameterized chaotic mapping, in this case, Arnold’s cat map (cf. [1]). The chaotic systems are characteristic of significant changes in output caused by even tiny changes in their input. Hence, their application in image content protection (cf. [2]). The original images collected from the public domain are first adjusted, using cropping, padding, and scaling, to a square measuring 512 by 512 pixels. Then, the picture is divided into 32-by-32-pixel tiles. The tiles are individually encoded with the use of Arnold’s cat map and form the final encoded image.

An example of the original and encoded picture from S1:

Subtask S2: Given the pair of images, one original and one after encoding, determine (Yes/No - 0/1) whether they are the same or not. The encoded image is a result of applying the same image obfuscation scheme based on the chaotic system theory as in subtask S1. The crucial difference is that the entire original images, measuring 512 by 512 pixels, are encoded with the use of Arnold’s cat map. This results in scrambled images that are visibly harder for the human eye to identify with the original.

An example of the original and encoded picture from S2:

Subtask S3: Given the pair consisting of an original image and its (binary sequence) encoding, determine (Yes/No - 0/1) whether they are the same thing or not. The original images collected from the public domain are first cropped so that each image is a square that only includes a single human face. Then, the scaling to a square measuring 52-by-52 pixels is applied. This small face image is finally encoded. The encoded image (binary file) is a result of applying an image obfuscation scheme based on the Brakerski-Fan-Vercauteren (BFV) Homomorphic Encryption (HE) scheme. BFV/HE encryption scheme allows computations directly on encrypted data. In particular, it allows only for the computation of additions and multiplications between ciphertexts and plaintexts (or ciphertexts and ciphertexts). The result of the operations, once decrypted, is the same as if it were applied to the corresponding plaintexts. The schemes like BFV or CKKS are based on a hard computation problem called Ring Learning With Errors (more in [3-4]).

Submission format: The participants are required to submit a text file containing exactly 30000 lines. The first 10000 lines should contain binary predictions for the subtask S1 (in the order corresponding to the image identifiers). The following 10000 lines should contain predictions for the subtask S2, and the final 10000 lines should contain predictions for the subtask S3. The prediction values should be either 0 (not the same) or 1 (the same).

Evaluation: Since the answers (predictions) are binary (0/1), the evaluation will be done as a weighted piece-wise accuracy measure.
The number presented on the leaderboard will be calculated as:

Acc =w1⋅Acc1 + w2⋅Acc2 + w3⋅Acc3

where (Acc1, Acc1, Acc3) and (w1,w2,w3) are the accuracy (proportion of correct predictions) and weights for subtasks S1, S2, and S3, respectively. As we want to promote the solutions that work well on harder sub-tasks, the weights w1, w2, and w3 are adjusted in such a way that it pays off to deal with the more complicated part of the whole task. At the same time, the weights are limited, so that w1+w2+w3=1 and Acc ∈ [0,1]. Weight values are:

For subtask S1: w1= 1/10
For subtask S2: w2= 3/10
For subtask S3: w3= 6/10

The preliminary evaluation is done on a small fraction of the test cases. For the final evaluation, the accuracy will be calculated on the remaining part of the test data set.

References to resources mentioned in the task description:

Wikipedia - the free Encyclopedia: Arnold’s cat map. https://en.wikipedia.org/wiki/Arnold's_cat_map
Z.-H. Guan, F. Huang, W. Guan: Chaos-based image encryption algorithm. Physics Letters A, Vol. 346, Issues 1-3, 2005, pp. 153-157 https://doi.org/10.1016/j.physleta.2005.08.006.
J. Fan, F. Vercauteren: Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive: Report 2012/144, https://eprint.iacr.org/2012/144
Wikipedia - the free Encyclopedia: Homomorphic Encryption. https://en.wikipedia.org/wiki/Homomorphic_encryption

Data files

Enroll

Please log in to the system!

Schedule

May 1, 2022: competition page goes online, solicitation of participants commences,
May 9, 2022: start of the competition, datasets become available, leaderboard comes to live,
October 7, 2022: deadline for submitting the solutions,
October 9, 2022: deadline for sending the reports, end of the competition,
October 10, 2022: online publication of the final results, sending invitations for submitting papers to the associated workshop at the IEEE Big Data 2022 conference,
October 31, 2022: deadline for submitting invited papers,
November 7, 2022: notification of paper acceptance,
November 15, 2022: camera-ready of accepted papers due.

Awards

MyLED will cover the costs of three registration fees for the competition participants with the top 3 solutions. MyLED will also sponsor the cash prizes:

2000 USD for the winning solution (+ the cost of IEEE Big Data 2022 registration)
1000 USD for the 2nd place solution (+ the cost of IEEE Big Data 2022 registration)
500 USD for the 3rd place solution (+ the cost of IEEE Big Data 2022 registration)

Organizing committee

The competition organizing committee is led by representatives of MyLED and QED Software:

Dominik Ślęzak - Organizing Committee Chair (QED Software & University of Warsaw)

Marcin Szczuka (MyLED & University of Warsaw)
Andrzej Janusz (QED Software & University of Warsaw)

The committee also includes (in alphabetic order):

Andrzej Bukała (QED Software)
Bogusław Cyganek (MyLED & AGH University of Science and Technology)
Jakub Grabek (MyLED & AGH University of Science and Technology)
Łukasz Przebinda (MyLED)
Tomasz Tajmajer (MyLED & University of Warsaw)
Andżelika Zalewska (QED Software)

Discussion	Author	Replies	Last post
is the evaluation tool running?	M	1	by M Sunday, September 18, 2022, 07:28:29
increase in rewards and changes in deadlines	Andrzej	0	by Andrzej Friday, August 26, 2022, 13:25:50
Deadline of IEEE BigData 2022 Cup	Huu-Thanh	1	by Andrzej Monday, August 22, 2022, 07:03:15