The ACM RecSys Challenge 2017 is focussing on the problem of job recommendations on XING in a cold-start scenario. The challenge will consists of two phases:
Both phases aim at the following task:
Task: given a new job posting p, the goal is to identify those users that (a) may be interested in receiving the job posting as a push recommendation and (b) that are also appropriate candidates for the given job.
For both offline and online evaluation, the same evaluation metrics and the same types of data sets will be used. The offline evaluation is essentially used as an entry gate to the online evaluation:
The online evaluation focus on a push recommendation scenario in which new items (job postings) are given and users need to be identified...
In the online challenge, teams will only submit their best user for an item to the system. For each target item users are allowed to submit one or more target users. However, each user can only be submitted once. Since push recommendations are presented to the users in a more prominent way, we decided on this restriction. These recommendations are then played out to the user over the following channels.
(p1, u42), (p1, u23), ...
where pi is the i-th target posting and uj is the j-th target user, the recommendations are delivered to users through the following channels:
Some challenges that the participating teams will need to solve:
Given a list of target items targetItems
, for which the recommender selects those users to whom item in T
, is pushed as recommendation, we compute the the leaderboard score as follows:
score(targetItems) = targetItems.map(item => score(item, recommendations(item))).sum
Here, recommendations(item)
specifies the list of users who will receive the given item as push recommendation. The function score(item, users)
is defined as follows:
score(item, users) = users.map(u => userSuccess(item,u)).sum + itemSuccess(item, users) userSucess(item, user) = ( if (clicked) 1 else 0 + if (bookmarked || replied) 5 else 0 + if (recruiter interest) 20 else 0 - if (delete only) 10 else 0 ) * premiumBoost(user) premiumBoost(user) = if (user.isPremium) 2 else 1 itemSuccess(item, users) = if (users.filter(u => userSuccess(item, u) > 0).size >= 1) { if (item.isPaid) 50 else 25 } else 0
Meaning:
score(item, users)
sums up the success rates of the users and the item-based success rate.userSucess(item, user)
scores a user-item pair based on the interactions between those items:
premiumBoost(user)
userSuccess scores count double points for premium users.itemSuccess(item, users)
if at least one successful push recommendation was created for a given item then this counts 50 points for paid items and 25 for other items.Purpose of evaluation metrics:
The above evaluation metrics will be applied for both offline evaluation and online evaluation (in the offline evaluation, the target items won't change during the challenge while in the online evaluation, new target items are relased on a daily basis).
The training dataset is supposed to be used for experimenting and training your models. You can split the interaction data into training and test data. For example: you can leave out the last complete week from the interaction data and then try to predict whether for a given job posting, you can predict the users that will positively interact with the posting.
The training dataset is a semi-synthetic sample of XING's dataset, i.e. it is not complete and enriched with noise in order anonymize the data. For example:
Attempting to identify users or to reveal any private information about the users or information about the business from which the data is coming from is strictly forbidden (cf. Rules).
interactions.csv: Interactions are all transactions between a user and an item including recruiter interests as well as impressions. Fields:
user_id
ID of the user who performed the interaction (points to users.id
) item_id
ID of the item on which the interaction was performed (points to items.id
) created_at
a unix time stamp timestamp representing the time when the interaction got created interaction_type
the type of interaction that was performed on the item: users.csv: Details about those users who appear in the above datasets. Fields:
id
anonymized ID of the user (referenced as user_id
in the other datasets above) jobroles
comma-separated list of jobrole terms (numeric IDs) that were extracted from the user's current job titles career_level
career level ID (e.g. beginner, experienced, manager): discipline_id
anonymized IDs represent disciplines such as "Consulting", "HR", etc. industry_id
anonymized IDs represent industries such as "Internet", "Automotive", "Finance", etc. country
describes the country in which the user is currently workingregion
is specified for some users who have as country de
. Meaning of the regions see belowexperience_n_entries_class
identifies the number of CV entries that the user has listed as work experiencesexperience_years_experience
is the estimated number of years of work experience that the user hasexperience_years_in_current
is the estimated number of years that the user is already working in her current job. Meaning of numbers: same as experience_years_experience
edu_degree
estimated university degree of the useredu_fieldofstudies
comma0
means "unknown" and edu_fieldofstudies > 0
entries refer to broad field of studies such as Engineering, Economics and Legal, ... wtcj
an estimation regarding the user's willingness to change jobs
premium
the user subscribed to XING's payed premium membershipitems.csv: Details about the job postings that were and should be recommended to the users.
id
anonymized ID of the item (referenced as item_id
in the other datasets above) industry_id
anonymized IDs represent industries such as "Internet", "Automotive", "Finance", etc.discipline_id
anonymized IDs represent disciplines such as "Consulting", "HR", etc. is_paid
(or is_payed
) indicates that the posting is a paid for by a compnay career_level
career level ID (e.g. beginner, experienced, manager) country
code of the country in which the job is offered latitude
latitude information (rounded to ca. 10km) longitude
longitude information (rounded to ca. 10km) region
is specified for some users who have as country `de`. Meaning of the regions: see below. employment
the type of emploment
created_at
a unix time stamp timestamp representing the time when the interaction got created title
concepts that have been extracted from the job title of the job posting (numeric IDs) tags
concepts that have been extracted from the tags, skills or company name The dataset contains two additional files that contain target item IDs and target user IDs:
Note: solutions that are submitted are only allowed to conatin items and users from the above files.
The baseline is using xgboost and is solely content-based. Details about the baseline and Python code are available at: github.com/recsyschallenge/2017/baseline/
For participating in the challenge, you will need to...
The public leaderboard is based on a 30% random sample of the entire ground truth. See: recsys.xing.com/leaders
Datasets that are released as part of the RecSys challenge are semi-synthetic, non-complete samples, i.e. XING data is sampled and enriched with noise. Regarding the released datasets, participants have to stick to the following rules:
Licence of the data: All rights are reserved by XING AG.
Each team should submit a paper describing the algorithms that they developed for the task (see paper submissions & workshop). Teams without a paper submission to the RecSys Challenge workshop will be removed from the final leaderboard.
It is not allowed to crawl additional information from XING (e.g. via XING's APIs or by scraping details from XING pages).
Please stick to the rules above, only sign-up for one team and stick to the submission limits: you can upload at maximum 20 solutions per day (for the offline challenge). We may suspend a team from the challenge if we get the impression that the team is not playing fair.
If you are unsure of whether something is allowed or not, contact us (e.g. create an issue on github) and we will be happy to help you. Above all remember it's all for science, so be creative, not evil!
Questions and remarks about the procedure and other aspects concerning the challenge can be submitted as github issues.
Prizes are given out to the teams that achieved the highest scores at the end of the online evaluation:
In order to get the prize money, teams have to describe their algorithms in an accompanying paper and present it during the RecSys Challenge workshop in Como, Italy.
When? | What? |
---|---|
Beginning of March | RecSys challenge starts:
|
April 16th (23:59 Hawaiian time) |
Offline evaluation ends:
|
May 1st | Online challenge starts:
|
June 4th (23:59 Hawaiian time) | Online evaluation ends:
|
June 12th |
|
June 18th | Paper submission deadline for RecSys Challenge workshop |
July 3rd | Notifications about paper acceptance |
July 17th | Deadline for camera-ready papers |
August 27th-31st | Workshop will take place as part of the RecSys conference in Como, Italy. |
Time | Session |
---|---|
09:00 - 10:30 |
Welcome:
|
10:30 - 11:00 | Coffe Break |
11:00 - 12:30 |
Paper presentations (20 minutes):
|
12:30 - 14:00 | Lunch Break |
14:00 - 15:30 |
Winner (20 minutes):
|
15:30 - 16:00 | Coffe Break |
16:00 - 17:30 | details will follow |
Each team - not only the top teams - should submit a paper that describes the algorithms that they used for solving the challenge. Those papers will be reviewed by the program committee (non-blind double review). At least one of the authors is expected to register for the RecSys Challenge workshop which will take place as part of the RecSys conference in Como, Italy.
Papers should not exceed 4-6 pages. They have to be uploaded as PDF and have to be prepared according to the standard ACM SIG proceedings format (in particular: sigconf): templates.
Papers (and later also the camera-ready versions) have to be uploaded via EasyChair: submit paper via EasyChair
We aim to publish the accepted papers in a special volume of ACM Sig Proceedings dedicated for the challenge (cf. Proceedings of the last year: ACM, DBLP).