1 year, 7 months ago

IEEE BigData 2022 Cup: Privacy-preserving Matching of Encrypted Images

IEEE BigData 2022 Cup: Privacy-preserving Matching of Encrypted Images is a data mining competition organized in association with the 2022 IEEE International Conference on Big Data (IEEE BigData 2022, https://bigdataieee.org/BigData2022/index.html). The task is to verify the image anonymization mechanisms of smart monitoring devices developed jointly by MyLED and QED Software as a part of the AraHUB technology (https://arahub.ai/). The challenge is sponsored by MyLED (https://myled.pl/).


This challenge is associated with the idea of building systems for collecting and analyzing data from the environment. With the use of proprietary devices consisting of a set of optical sensors, it is possible to collect information about people moving nearby, including: movement direction, interest analysis, and demographic profile. Then, with the use of AI/ML algorithms, it is possible to interpret the collected information and provide the profile of an audience within the range of the devices.

The proposed scope of this challenge is closely related to the construction of the next generation of  "smart" monitoring devices and systems. Such systems are being developed by MyLED and QED software as a part of the AraHUB initiative (arahub.ai).  One of the very important challenges in this project is the preservation of privacy and protection of personal information (e.g., name, gender, age, location) of the persons being monitored. In order to meet the legal requirements, such as those established in the EU's General Data Protection Regulation (GDPR), our monitoring systems (platforms) are intended to use strong encryption and anonymization techniques. These anonymization and privacy-preservation methods have to be robust and safe. The challenge is intended as a verification mechanism for the sufficiency of the level of encryption used at various levels of data processing in the monitoring systems.  

Special session at IEEE Big Data 2022: As in previous years, a special session devoted to the competition will be held at the conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The papers will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report.

Terms & Conditions
Rank Team Name Is Report Preliminary Score Final Score Submissions
True 0.7204 0.694600 188
True 0.6919 0.691500 41
True 0.7086 0.689400 244
True 0.8035 0.689200 285
True 0.5933 0.593400 12
True 0.5593 0.564400 4
True 0.5514 0.548200 1
True 0.5380 0.540900 11
True 0.5165 0.515300 29
True 0.5117 0.511600 20
True 0.5047 0.498800 1

The task of the competition is aimed at checking whether the encoding/scrambling techniques that we employ in the new generation of “smart” monitoring systems are sufficiently secure and robust. We provide the challenge participants with a (training) set of examples consisting of the original image (taken from the public domain) and its encoded (scrambled) version. Then, on a separate set of pairs (test/verification set), the participants have to decide (Yes/No) whether the scrambled file contains the visible image or not. If it is possible to train a model that would make such a decision with high accuracy, it will be a proof-by-example that the encoding techniques used need strengthening.

The task is composed of three subtasks S1, S2, and S3 differentiated by the encoding method used and, in the case of S3, the set of images used. Subtasks S1 and S2 correspond to the encryption that takes place inside the monitoring system. This type of data would not be possible to obtain without physical access to the monitoring device. Subtask S3 corresponds to the situation, when the device sends a more computationally laborious task, such as gender or age recognition from a face image, via the Internet to the computational cloud. As this kind of communication can be eavesdropped on, the encryption used must be strengthened.

Subtask S1: Given the pair of images, one original and one after encoding, determine (Yes/No - 0/1) whether they are the same or not. The encoded image is a result of applying an image obfuscation scheme based on the chaotic system theory. In these algorithms, the original image is an argument for parameterized chaotic mapping, in this case, Arnold’s cat map (cf. [1]). The chaotic systems are characteristic of significant changes in output caused by even tiny changes in their input. Hence, their application in image content protection (cf. [2]). The original images collected from the public domain are first adjusted, using cropping, padding, and scaling, to a square measuring 512 by 512 pixels. Then, the picture is divided into 32-by-32-pixel tiles. The tiles are individually encoded with the use of Arnold’s cat map and form the final encoded image. 

An example of the original and encoded picture from S1:


Subtask S2: Given the pair of images, one original and one after encoding, determine (Yes/No - 0/1) whether they are the same or not. The encoded image is a result of applying the same image obfuscation scheme based on the chaotic system theory as in subtask S1. The crucial difference is that the entire original images,  measuring 512 by 512 pixels, are encoded with the use of Arnold’s cat map. This results in scrambled images that are visibly harder for the human eye to identify with the original.

An example of the original and encoded picture from S2:


Subtask S3: Given the pair consisting of an original image and its (binary sequence) encoding, determine (Yes/No - 0/1) whether they are the same thing or not. The original images collected from the public domain are first cropped so that each image is a square that only includes a single human face. Then, the scaling to a square measuring 52-by-52 pixels is applied. This small face image is finally encoded. The encoded image (binary file) is a result of applying an image obfuscation scheme based on the Brakerski-Fan-Vercauteren (BFV) Homomorphic Encryption (HE) scheme. BFV/HE encryption scheme allows computations directly on encrypted data. In particular, it allows only for the computation of additions and multiplications between ciphertexts and plaintexts (or ciphertexts and ciphertexts). The result of the operations, once decrypted, is the same as if it were applied to the corresponding plaintexts. The schemes like BFV or CKKS are based on a hard computation problem called Ring Learning With Errors (more in  [3-4]). 

Submission format: The participants are required to submit a text file containing exactly 30000 lines. The first 10000 lines should contain binary predictions for the subtask S1 (in the order corresponding to the image identifiers). The following 10000 lines should contain predictions for the subtask S2, and the final 10000 lines should contain predictions for the subtask S3. The prediction values should be either 0 (not the same)  or 1 (the same).

Evaluation: Since the answers (predictions) are binary (0/1), the evaluation will be done as a weighted piece-wise accuracy measure. 
The number presented on the leaderboard will be calculated as:

Acc =w1⋅Acc1 + w2⋅Acc2 + w3⋅Acc3

where (Acc1, Acc1, Acc3) and (w1,w2,w3) are the accuracy (proportion of correct predictions) and weights for subtasks S1, S2, and S3, respectively. As we want to promote the solutions that work well on harder sub-tasks, the weights  w1, w2, and w3 are adjusted in such a way that it pays off to deal with the more complicated part of the whole task.  At the same time, the weights are limited, so that w1+w2+w3=1 and Acc ∈ [0,1]. Weight values are:

For subtask S1: w1= 1/10
For subtask S2: w2= 3/10
For subtask S3: w3= 6/10

The preliminary evaluation is done on a small fraction of the test cases. For the final evaluation, the accuracy will be calculated on the remaining part of the test data set.

References to resources mentioned in the task description:

  1. Wikipedia - the free Encyclopedia: Arnold’s cat map. https://en.wikipedia.org/wiki/Arnold's_cat_map
  2. Z.-H. Guan, F. Huang, W. Guan: Chaos-based image encryption algorithm. Physics Letters A, Vol. 346, Issues 1-3, 2005, pp. 153-157 https://doi.org/10.1016/j.physleta.2005.08.006.
  3. J. Fan, F. Vercauteren: Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive: Report 2012/144, https://eprint.iacr.org/2012/144
  4. Wikipedia - the free Encyclopedia: Homomorphic Encryption. https://en.wikipedia.org/wiki/Homomorphic_encryption
In order to download competition files you need to be enrolled.
Please log in to the system!
  • May 1, 2022: competition page goes online, solicitation of participants commences,
  • May 9, 2022: start of the competition, datasets become available, leaderboard comes to live, 
  • October 7, 2022: deadline for submitting the solutions, 
  • October 9, 2022: deadline for sending the reports, end of the competition, 
  • October 10, 2022: online publication of the final results, sending invitations for submitting papers to the associated workshop at the IEEE Big Data 2022 conference, 
  • October 31, 2022: deadline for submitting invited papers,
  • November 7, 2022: notification of paper acceptance,
  • November 15, 2022: camera-ready of accepted papers due.

MyLED will cover the costs of three registration fees for the competition participants with the top 3 solutions. MyLED will also sponsor the cash prizes:

  • 2000 USD for the winning solution (+ the cost of IEEE Big Data 2022 registration)
  • 1000 USD for the 2nd place solution (+ the cost of IEEE Big Data 2022 registration)
  • 500 USD for the 3rd place solution (+ the cost of IEEE Big Data 2022 registration)

The competition organizing committee is led by representatives of MyLED and QED Software:

  • Dominik Ślęzak - Organizing Committee Chair (QED Software & University of Warsaw)
  • Marcin Szczuka (MyLED & University of Warsaw)
  • Andrzej Janusz (QED Software & University of Warsaw)

The committee also includes (in alphabetic order):

  • Andrzej Bukała (QED Software)
  • Bogusław Cyganek (MyLED & AGH University of  Science and Technology)
  • Jakub Grabek (MyLED & AGH University of  Science and Technology)
  • Łukasz Przebinda (MyLED)
  • Tomasz Tajmajer (MyLED & University of Warsaw)
  • Andżelika Zalewska (QED Software)

The competition will be organized jointly by:

MyLED is an advertisement agency specializing in the preparation, execution, and monitoring of DOOH (Digital Out-Of-Home) advertising campaigns using digital LED media.  It is a part of one of the most experienced and longest operating OOH companies in Poland – the Jet Line Group. MyLED has created an innovative inventory of management, software, and hardware tools for delivering advertisements on a network of LED displays in Poland and other EU countries. 

QED Software is an AI products company that supports pioneers in their growth. Based in Poland, the company specializes in artificial intelligence and machine learning technologies that can be applied in various fields. Moreover, by combining academic experience with a business network, the company organizes international data mining challenges using its own competition platform KnowledgePit (https://knowledgepit.ai).

MyLED and QED Software are closely cooperating on the AraHub initiative (https://arahub.ai/) aimed at the construction of intelligent monitoring devices. Securing and providing anonymity for these devices has led to the creation of this competition.

This forum is for all users to discuss matters related to the competition. Good manners apply!
  Discussion Author Replies Last post
is the evaluation tool running? M 1 by M
Sunday, September 18, 2022, 09:28:29
increase in rewards and changes in deadlines Andrzej 0 by Andrzej
Friday, August 26, 2022, 15:25:50
Deadline of IEEE BigData 2022 Cup Huu-Thanh 1 by Andrzej
Monday, August 22, 2022, 09:03:15