Data Science Team | April 8, 2022 | 5 min read

Connecting the Dots: strengthening recommendations for our customers (Part One)

If you like ordering Mediterranean cuisine, there are high chances that you might like to try continental dishes too. Now getting that right for one person is a cakewalk but matching preferences of millions of customers with a multitude of food combinations, every day, is simply not easy. Well, this is where data science comes into play.

Classical recommendation algorithms consider a customer’s past behaviour and create interaction features accordingly. At times this can get limited to only the items (restaurants and dishes for us) that the customers interact with, leading to sparsity for mostly our customers interacting with only select restaurants and/or dishes.

The challenge thus lies in providing richer data in our recommendations to overcome:

a) sparsity

b) and redundancy (we get limited to suggesting the same set of restaurants and dishes to customers basis their interaction)

We use knowledge graphs to predict which restaurants, dishes, and cuisines will go well with your taste. Unlike tabular data, individual data points are actually linked with each other. So, a customer can be linked to a ‘never ordered before’ from a restaurant that is also connected to other customers – allowing us to refine our recommendations consistently.

Capturing this relation seemed important. So we ran a few experiments with graph algorithms and observed their viability for automating recommendations. So you see, there’s a lot that goes behind the scenes to perfect our restaurant recommendations for you. Let’s learn more about this in this article.

Graph Learning

Lately, there have been great advancements in graph learning. Many algorithms have been developed to solve link prediction, node classification, and node representation problems.

Our focus narrows on node representation learning as it can be used in many downstream applications; recommendation being one of them.

The idea behind node embedding is to represent each node in form of a latent vector(z) such that

Now the definition of node similarity can vary and mean either of the following:

Adjacency-based similarity: directly connected nodes
Multi-hop similarity: nodes appearing in 1-hop, 2-hop neighbour (i.e. 1 hop = 1 jump. Direct connection = 1 hop, whereas 2nd degree connection = 2 hop)
Random walk-based similarity: nodes appearing on a random walk

Talk Data to me

Image: The above graph displays customer-restaurant interaction data where:

nodes represent customers (U1, U2, U3) and restaurants (R1, R2, R3)
edges represent if interaction happens between customers and restaurants

Please note that graphs can be either homogeneous or heterogeneous. Let’s elaborate:

homogeneous graphs: All nodes represent the same entity and edges represent the same relation between nodes
heterogeneous graphs: The nodes and the edges can be of different types

In our case, the graph is heterogeneous since customers and restaurants are two different entities. However, we used a conversion, i.e. homogeneous one, to apply available graph learning algorithms, since the majority of available research is focused on homogeneous graphs.

Now let’s get more graph-ic!

Section 1: Introduction to GraphSAGE

GraphSAGE is a framework for inductive representation learning on large graphs. We preferred GraphSAGE because:

It is inductive and fits our requirement to create dynamic graphs.
It does not require re-training (in case there’s a new restaurant or customer onboarded)
It is flexible and facilitates transfer learning so that a model trained in one city can be replicated in other cities

Image: The similarity aspect is calculated via aggregating the k-hop neighbourhood.

For any supervised learning tasks, training data with labels is needed. In the case of homogenous graphs, the node labels can be used to train and do node classification. For heterogeneous graphs, supervised learning can be used for the link prediction where the link could be # of orders between a customer and a restaurant.

Since our main objective was to learn generic node representations using graphs and so that graph-based similarity remains intact, unsupervised learning was more appropriate. For that to happen,

One can randomly choose a node and do a random walk of fixed length. All nodes appearing on this walk would be labelled as +ve samples.
Pick an equal number of -ve samples from far away nodes randomly

The model is trained to classify these +ve and -ve pairs. The main hyperparameters we tested:

K: used values are 1 or 2
Length of random walk (in nodes): 3, 4, 5, 6
No. of random walk per node: 2, 3
Node embedding shape: 256, 512, 1024

From the node representations thus generated (Iteration 0), we used cosine similarity between customer-restaurant pairs as a feature in our downstream personalisation model. It came out as the second most important feature as per SHAP values and we also observed a slight improvement in AUC.

Section 2: Further experimenting with GraphSAGE

In a bid to generate more direct recommendations, we narrowed our focus to improving two measures:

Mean reciprocal rank (MRR), Precision@k and Recall@k
Training time

Let’s see how our iterations fared

What did we learn from our experimentations?

Although treated as a homogenous graph, customers and restaurants have to be treated as different entities owing to differences in their respective degrees of connection. The majority of customer nodes connect to a few restaurants, but the restaurant will connect to a higher number of customers. Opting for a heterogeneous graph is thus preferred
For unsupervised learning, we took random nodes as negative samples; nodes not present in the random walk. However, a mere absence of direct interaction between nodes does not particularly mean that it’s a negative sample.
Improvements in accuracy weren’t significant enough to offset the cost of high training time. Training time for just NCR data was ~20 hours. Also, models trained in one city were not able to perform well in other cities. Managing multiple city-level models would have been a nightmare.

GraphSAGE – not a rage!

Alas, while GraphSAGE proved to be a useful technique, we realised scaling it up to up to 500 cities with operations was not practical. Our search for a robust graph-based embedding solution was still on.

This is a two-part series on improving the recommendation search for our customers. Read Part Two here, where we fulfill our quest to find a robust, graph-based embedding solution.

If you are interested to work with us and work on such innovative problem solving, then connect with Manav Gupta on Linkedin. We are always looking for enterprising data scientists at Zomato.

This blog was written by Saurabh Gupta, Sonal Garg and Manav Gupta.