The RecSys 2024 Challenge will be organized by Johannes Kruse and Kasper Lindskow (Ekstra Bladet), Anshuk Uppal, Michael Riis Andersen, and Jes Frellsen (Technical University of Denmark), Marco Polignano (University of Bari Aldo Moro, Italy), Claudio Pomo (Politecnico di Bari, Italy), and Abhishek Srivastava (IIM Visakhapatnam, India) based on the data provided by Ekstra Bladet. This year’s challenge focuses on online news recommendation, addressing both the technical and normative challenges inherent in the design of effective and responsible recommender systems for news publishing.
The challenge will delve into the unique aspects of news recommendation. These include modeling user preferences based on implicit behavior, accounting for the influence of the news agenda on user interests, and managing the rapid decay of news items. Furthermore, our challenge also embraces the normative complexities. These involve investigating the effects of recommender systems on the news flow, and whether they resonate with editorial values. By providing participants with a comprehensive dataset and a robust news recommendation evaluation framework, our goal is to tackle these multifaceted challenges head-on. As part of the challenge, Ekstra Bladet will be releasing an anonymized dataset with approximately 2 million random users who engaged with EkstraBladet.dk over a six-week period.
The Ekstra Bladet RecSys Challenge aims to predict which article a user will click from a list of articles that was seen during a specific impression. Utilizing the user's browsing history, session details (like time and device used), and personal metadata (including gender and age), along with a list of candidate news articles, listed in an impression log. The challenge's objective is to rank the candidate articles based on the user's personal preferences. This involves developing models that encapsulate both the users and the articles through their content and the users' interests. The models are to estimate the likelihood of a user clicking each article by evaluating the compatibility between the article's content and the user's preferences. The articles are ranked based on these likelihood scores, and the precision of these rankings is measured against the actual selections made by users.
To evaluate the models we use several standard metrics in the recommendation field, including the area under the ROC curve (AUC), mean reciprocal rank (MRR), and normalized discounted cumulative gain (nDCG@K) for K shown recommendations. To address the normative complexities inherent in news recommendations, the test set incorporates samples specifically designed to assess models based on normative properties. This includes evaluating models on Beyond-Accuracy Objectives, such as intra-list diversity, serendipity, novelty, coverage, among others. The final result is the average of these metrics across all impression logs.
The Ekstra Bladet News Recommendation Dataset (EB-NeRD) is a large-scale Danish dataset created by Ekstra Bladet to support advancements and benchmarking in news recommendation research. EB-NeRD comprises over 2.7 million users and more than 600 million impression logs from Ekstra Bladet. Alongside, we offer a collection of more than 120 thousands news articles, enriched with textual content features such as titles, abstracts, and bodies. This enables text features in a low-resource language as context for recommender systems.
To support advancements in news recommendation research, we have constructed the Ekstra Bladet News
Recommendation Dataset (EB-NeRD). It was collected from the user behavior logs at
Ekstra Bladet. We collected behavior logs from
active users during the 6 weeks from April 27th to June 8, 2023. This timeframe was selected to
avoid major events, e.g., holidays or elections, that could trigger atypical behavior at Ekstra
Bladet.
The active users were defined as users who had at least 5 and at most 1,000 news click records in a
three-week period from May 18th to June 8, 2023. In order to protect user privacy, every user was
de-linked from the production system when securely hashed into an anonymized ID using onetime
salt mapping. Alongside, we provide Danish news articles
published by Ekstra Bladet. Each article is enriched with textual context features such as title,
abstract, body, categories, among others. Furthermore, we provide features that have been generated by
proprietary models, including topics, named entity recognition (NER), and article embeddings.
Each dataset bundle—demo, small, and large—consists of a training set and validation set, together with the articles (articles.parquet) present in the bundle. The official test set is to be downloaded separately from these. Each data split has two files: 1) the behavior logs for the 7-day data split period (behaviors.parquet) and 2) the users' click histories (history.parquet), i.e., 28 days of clicked news articles prior to the data split's behavior logs. The click histories are fixed to the period prior to the behavior logs; i.e., they are not updated within the data split period.
# | File Name | Description |
---|---|---|
1 | behaviors.parquet | The impression logs |
2 | history.parquet | The click histories of users |
3 | articles.parquet | The information of news articles |
4 | artifacts.parquet | The embeddings of the articles textual information |
For further details, please refer to the dedicated website of Ekstra Bladet.
When? | What? |
---|---|
8 March, 2024 |
Start RecSys Challenge
Release dataset |
25 March, 2024 | Submission System Open |
4 April, 2024 | Leaderboard live |
21 June, 2024 | End RecSys Challenge |
24 June, 2024 |
Final Leaderboard & Winners
EasyChair open for submissions |
1 July, 2024 |
Code Upload
Upload code of the final predictions |
18 July, 2024 |
Paper Submission Due |
3 August, 2024 | Paper Acceptance Notifications |
29 August, 2024 | Camera-Ready Papers |
October 2024 | RecSys Challenge Workshop |
Submission website: EasyChair
09:00-10:30 | Session 1 |
9:00-9:15 | Opening |
9:15-10:00 | Keynote Speech Balancing Accuracy and Editorial Values in News Recommendations --- Kasper Lindskow, Ph.d., Head of AI at JP/Politikens Media Group |
10:00-10:15 | Leveraging User History with Transformers for News Clicking: The DArgk Approach --- Juan Manuel Rodriguez and Antonela Tommasel |
10:15-10:30 | Recommendations for the Recommenders: Reflections on Prioritizing Diversity in the RecSys --- Lucien Heitz, Sanne Vrijenhoek and Oana Inel |
10:30-11:15 | Coffee Break |
11:15-12:45 | Session 2 |
11:25-11:45 | Enhancing News Recommendation with Transformers and Ensemble Learning (🥇) --- Kazuki Fujikawa, Naoki Murakami, and Yuki Sugawara |
11:45-12:05 | Large Scale Hierarchical User Interest Modeling for Click-through Rate Prediction (🥈) --- Taofeng Xue, Zhimin Lin, Zhijian Zhang, Linsen Guo, Haoru Chen, Mengjiao Bao, and Peng Yan |
12:05-12:25 | Harnessing Temporal Dynamics and Content: An Ensemble of Gradient Boosting Machines for News Recommendation (🥉) --- Tomomu Iwai, Akihiro Tomita, Tomoyuki Arai, Hiroki Ogawa, and Takuma Saito |
12:25-12:45 | Exploiting Contextual Normalizations and Article Endorsement for News Recommendation (🥇) --- Andrea Alari, Lorenzo Campana, Federico Giuseppe Ciliberto, Saverio Maggese, Carlo Sgaravatti, Francesco Zanella, Andrea Pisani, and Maurizio Ferrari Dacrema |
12:45-14:30 | Lunch Break |
14:30-16:00 | Session 3 |
14:50-15:10 | Enhancing News Recommendation with Real-Time Feedback and Generative Sequence Modeling --- Qi Zhang, Jieming Zhu, Jiansheng Sun, Guohao Cai, Ruining Yu, Bangzheng He, and Liangbi Li |
15:10-15:30 | DIVAN: Deep-Interest Virality-Aware Network to Exploit Temporal Dynamics in News Recommendation --- Antonio Ferrara, Marco Valentini, Paolo Masciullo, Antonio De Candia, Davide Abbattista, Riccardo Fusco, Claudio Pomo, Vito Walter Anelli, Giovanni Maria Biancofiore, Ludovico Boratto, and Fedelucio Narducci |
15:30-15:50 | Leveraging LightGBM Ranker for Efficient Large-Scale News Recommendation Systems --- Tetsuro Sugiura, Yosuke Yamagishi, and Yodai Kishimoto |
15.50-16:00 | Winners' Ceremony (🥇 🥈 🥉) & Closing Remarks |