Final Evaluation Question

IEEE BigData 2024 Cup: Predicting Chess Puzzle Difficulty

by mcognetta - Monday, August 19, 2024, 06:20:44

Will the final evaluation be done on the same dataset state as the current evaluation? Or will you use updated ratings?

To clarify, I mean, presumably there was some cutoff date for the puzzle rating collection and the ratings at that time are what are used in the current evaluation. However, perhaps the puzzle ratings are still being updated (and thus further converging to their true distribution). Will these updated ratings be used, or will it stay with the ratings from that cutoff point?

Thanks!

RE: Final Evaluation Question

by BigDataChess - Wednesday, August 21, 2024, 15:24:13

Whatever ratings were used for the leaderboard will not be used for final evaluation.

We will use updated puzzle ratings, but don't let that mislead you - the final number of attempts at solving the puzzles will be similar in leaderboard and final test sets. It's simply that we prioritised a first batch back then when tagging (in other words, we got enough taggings on those puzzles to get their RD <130, and now we're trying to do the same with the rest.)

RE: Final Evaluation Question

by mcognetta - Wednesday, August 21, 2024, 15:55:48

Thanks for the reply. So _all_ of the ratings will be updated (not just the ones that were not used for the preliminary evaluations)?