If you like ordering Mediterranean cuisine, there are high chances that you might like to try continental dishes too. Now getting that right for one person is a cakewalk but matching preferences of millions of customers with a multitude of food combinations, every day, is simply not easy. Well, this is where data science comes into play.
Classical recommendation algorithms consider a customer’s past behaviour and create interaction features accordingly. At times this can get limited to only the items (restaurants and dishes for us) that the customers interact with, leading to sparsity for mostly our customers interacting with only select restaurants and/or dishes.
The challenge thus lies in providing richer data in our recommendations to overcome:
a) sparsity
b) and redundancy (we get limited to suggesting the same set of restaurants and dishes to customers basis their interaction)
We use knowledge graphs to predict which restaurants, dishes, and cuisines will go well with your taste. Unlike tabular data, individual data points are actually linked with each other. So, a customer can be linked to a ‘never ordered before’ from a restaurant that is also connected to other customers – allowing us to refine our recommendations consistently.
Capturing this relation seemed important. So we ran a few experiments with graph algorithms and observed their viability for automating recommendations. So you see, there’s a lot that goes behind the scenes to perfect our restaurant recommendations for you. Let’s learn more about this in this article.
Lately, there have been great advancements in graph learning. Many algorithms have been developed to solve link prediction, node classification, and node representation problems.
Our focus narrows on node representation learning as it can be used in many downstream applications; recommendation being one of them.
The idea behind node embedding is to represent each node in form of a latent vector(z) such that
Now the definition of node similarity can vary and mean either of the following:
Image: The above graph displays customer-restaurant interaction data where:
Please note that graphs can be either homogeneous or heterogeneous. Let’s elaborate:
In our case, the graph is heterogeneous since customers and restaurants are two different entities. However, we used a conversion, i.e. homogeneous one, to apply available graph learning algorithms, since the majority of available research is focused on homogeneous graphs.
GraphSAGE is a framework for inductive representation learning on large graphs. We preferred GraphSAGE because:
Image: The similarity aspect is calculated via aggregating the k-hop neighbourhood.
For any supervised learning tasks, training data with labels is needed. In the case of homogenous graphs, the node labels can be used to train and do node classification. For heterogeneous graphs, supervised learning can be used for the link prediction where the link could be # of orders between a customer and a restaurant.
Since our main objective was to learn generic node representations using graphs and so that graph-based similarity remains intact, unsupervised learning was more appropriate. For that to happen,
The model is trained to classify these +ve and -ve pairs. The main hyperparameters we tested:
From the node representations thus generated (Iteration 0), we used cosine similarity between customer-restaurant pairs as a feature in our downstream personalisation model. It came out as the second most important feature as per SHAP values and we also observed a slight improvement in AUC.
In a bid to generate more direct recommendations, we narrowed our focus to improving two measures:
Let’s see how our iterations fared
Alas, while GraphSAGE proved to be a useful technique, we realised scaling it up to up to 500 cities with operations was not practical. Our search for a robust graph-based embedding solution was still on.
This is a two-part series on improving the recommendation search for our customers. Read Part Two here, where we fulfill our quest to find a robust, graph-based embedding solution.
If you are interested to work with us and work on such innovative problem solving, then connect with Manav Gupta on Linkedin. We are always looking for enterprising data scientists at Zomato.
This blog was written by Saurabh Gupta, Sonal Garg and Manav Gupta.