News Recommendation at Scale
Proudly written by Joris Baan (@jsbaan) — ML Engineer @ DPG Media
With roughly a thousand news articles published by journalists at DPG Media every day, it is a challenge to bring the right articles to the right readers at the right time. An important role of newspapers is to optimally inform readers, balancing between providing news that interests readers, news that provides alternate views or opinions, and news that editors believe readers have to know. Welcome to the world of news personalization.
News personalization is an exciting space, both in terms of engineering and machine learning. It is also very socially relevant due to concerns about filter bubbles and privacy. In this blog post, I will talk about how we use machine learning at DPG Media to get the right content to the right users at the right time. There are many products in which personalization can materialize (e-mail, “for you” sections on websites, etc.). In this post, I will discuss news recommendation in the form of personalized push notifications. This illustrates the core concept and currently runs in production for some of our major news titles.
Why is it hard?
News recommendation is an interesting problem for two reasons.
- ML modeling: designing a recommender that understands both users and news articles and can match them while considering goals such as relevancy, diversity, and recency.
- Scale: DPG Media is the largest Dutch/Belgian publisher with thousands of journalists producing high-quality content every day. Our users leave traces of their interactions with our products through an event stream with peak volumes around 100K events per second. A recommender must be fast enough to handle these loads in our production environment.
So, News Recommendation
Let’s dive right into it! There are several ways to go about news recommendation. Since it’s a recommendation task, the first thing that comes to mind is collaborative filtering. Collaborative filtering is based on how users interact with items. Given that user X has read a set of news articles, it makes sense to recommend an article that users with a similar reading history also interacted with. However, there is one big problem with collaborative filtering in the news domain.
An important aspect of news is, perhaps unsurprisingly, the fact that it’s new. Therefore, we typically don’t have much information about interactions when we would like to recommend an article. This is commonly referred to as the item cold-start problem. In particular, when sending personalized push notifications at the moment of publication, items have had no interactions at all.
The core idea
Because of this cold-start problem, we went for a pure content-based approach. The core idea is straightforward. A vector representation of a news article is concatenated to a vector representation of a user profile and fed to a classifier that predicts how well the two match. This approach opens up a huge playing field in which we can take advantage of exciting recent developments in Natural Language Processing (NLP) & representation learning, such as Google’s BERT or OpenAI’s GPT-2, to compute more expressive user and article embeddings. However, for the first iteration, we left all the heavy machinery on the shelf and started with a simple, scalable solution.
Along with a few details
Article embeddings are defined as a TF-IDF weighted average over word2vec embeddings of all words in an article. User embeddings are simply the sum of the article embeddings in a user’s reading history. Whenever a news article is published, it enters our push pipeline, and we want to find interested users. As we have millions of users on our platforms, considering all users for every new article is infeasible. Instead, we filter based on recent reading activity and employ a simple trick to sample candidate users based on how often they recently opened personalized push-notifications.
This provides a preference to users that recently interacted a lot with what we recommended. Besides huge computational gains, this also ensures that users who never open push messages slowly stop receiving them. Finally, we have a model (‘the scheduler’) that predicts whether an article should be recommended to a user immediately or with a certain delay. This is based on the current time and the times at which a user usually consumes news.
Cool, but does it work?
It looks like it! When comparing personalized push notifications to pushes sent by our editors, we see a factor 10 increase of users that open a push, based on millions of push notifications. (1)
Filter bubbles and diversity
A common conception about news recommendation is that it might create filter bubbles, in which users only receive recommendations that confirm their view. We take this very seriously and are currently experimenting with several diversity metrics. These metrics measure diversity along dimensions such as content (article embedding), author, brand, sentiment, recency, etc.
Inspired by Maximum Marginal Relevance (MMR) and information gain, we model reading behavior by sequentially traversing a list of recommended articles and computing diversity between current and previous articles. The idea is that this mimics the way a user perceives the diversity of a recommendation based on previous reads or recommendations. Our goal is to first evaluate our recommender in terms of diversity and then use it as an objective function during training. Or perhaps as part of a post-processing step that re-ranks based on diversity.
In our current production systems, we don’t think that filter bubbles are prominent yet. The users exposed to our recommender only get a small fraction of their news diet through our systems. On average, our users receive only about one personalized push notification every three days. The rest of their news consumption on our platforms is not served through personalization algorithms. Also, we currently don’t personalize content cross-brand, which makes the primary ‘filter bubble’ for a given user his/her choice of a specific news platform or brand.
What we are working on right now
As I mentioned before, I am most excited about experimenting with more recent state-of-the-art language models. The rise of transformers and large language models such as BERT has moved the field of NLP forward at an unprecedented rate. As our approach revolves around language-based user and item representations, I expect to make big strides using such architectures. I’m particularly excited about framing news recommendation as a masked language modeling (MLM) objective. In MLM, random words in a sentence are masked or replaced by other words. The task is then to reconstruct the original sentence. This results in a model that is capable of predicting likely next words given a context.
MLM is a form of self-supervised learning that leverages unlabeled text to obtain rich, contextualized word representations. Interestingly, we could apply the same principle to reading behavior. Like sentences, reading is a sequential process. Unlike sentences, the basic units are articles instead of words. The model would learn to assign probabilities to articles given (part of) a user’s reading behavior as context. This idea was recently applied to several sequential recommendation tasks, such as movies, games, or products. We are currently experimenting with this in the news domain with one of our interns.
Using BERT-based models with an MLM objective in production opens up a new can of worms. However: they are very computationally heavy. Luckily, this is something that the research community is actively working on, for example, with DistilBERT.
We’ll keep on improving the quality of our recommendations while taking into account relevancy, diversity, and recency. Based on results for personalized push notifications, it looks like we are heading in the right direction. I’m looking forward to experimenting with more recent representation learning model architectures and deploying models in other personalization products.
If you have questions, don’t hesitate to get in touch! Also, a big shout out to the rest of the news personalization team for their awesome work, making everything I discussed possible!
— Joris Baan (@jsbaan), ML Engineer @ DPG Media
(1) Though we know the number of devices that a push was sent to, we are not certain that a user was actually presented with this notification. For example, because of OS-level device settings that silence notifications. This could be especially disadvantageous for editorial push, as candidate users are not retrieved based on recent reading activity and push-open interactions, but on news categories they subscribed to (e.g., sports, politics, etc.). We are actively working on improving this comparison.