We ended Part ONE on our quest to find a robust graph-based embedding solution. Let’s continue our search hereon:
In our quest, we stumbled upon an AI framework called Cleora1, which claimed to be ~197x faster than DeepWalk or Node2Vec. It checked all the boxes and qualified for our next iteration. Let’s see why:
So in Cleora, a node embedding is calculated by taking the average weights of its neighbours and then normalising it. While the process is iterative, the number of iterations is a hyperparameter along with the embedding size. However in Cleora’s case, node attributes are not required, thus it is purely based on graph structure and the local neighbourhood.
We used Cleora for customer-restaurants graph data in the National Capital Region (NCR) area. And to our delight, the embedding generation was superfast (i.e <5 minutes). For context, do remember that GraphSAGE took ~20hours for the same data in the NCR region.
Since we had a bipartite graph with customers and restaurants as nodes, the notion of similarity in the embedding space meant that customers interacting with similar restaurants would have had high similarity and vice-versa.
Following are the examples of restaurant-restaurant similarity (out of the top 10 similar restaurants of Gurugram) being captured.
Here, the theme is around restaurants offering healthy food, for which Caterspoint is a vetted example basis customer behaviour.
Here we notice that customers ordering from a healthy restaurant are likely to be closely connected to other healthy restaurants. No cuisine-related, price-related or any other metadata is passed in the embedding creation process; only customer nodes and restaurant nodes with edges are considered.
Here, the theme is around restaurants offering gourmet food. In this case, customers ordering from high-end restaurants are closely connected to other high-end restaurants.
Here the theme is restaurants offering regional cuisine. Again, customers ordering from these restaurants will be more closely connected to other similarly placed regional restaurants.
Hence, there is strong evidence to use these embeddings downstream as both customer similarity and restaurant similarity capture the graph related properties robustly.
What we saw with Cleora was that it preserved entity type info, which was important for heterogeneous graphs. However, since both customer and restaurant embeddings are separated (as per the figure above), restaurant recommendations cannot be made by directly taking the dot product between both entities and sorting them based on similarity. To do that, we need to represent customers in terms of ordered restaurants.
More details can be found on this2 thread.
To solve this problem, we decided to explore EMDE4 (Efficient Manifold Density Estimator) exploits locality-sensitive hashing to create ‘sketches’, i.e. histogram-like structures which represent density on multidimensional manifolds. One can assume that these items are in a continuous shape and not as discrete points in the embedding space. Here, customer preference is the probability density function, so items liked by the customers can act as samples drawn using probability density. EMDE estimates this probability density function.
EMDE recommender architecture (Source: An efficient manifold density estimator for all recommendation systems)
Essentially, this process compresses multiple vectors into a single one. It takes item embeddings (from multiple sources) and a list of items per customer as input and returns sparse customer embedding as output. Once customer embedding is calculated, the recommendation task is then straightforward. There are three parts in the algorithm.
This includes generating the item and customer sketch. There are two hyperparameters:
Item sketch: the item embedding space is divided into K partitions and use LSH to create an encoding vector. This process is repeated N times and each of the generated vectors are concatenated to get an item sketch. LSH5 helps to ensure that semantically, only similar items share their assigned buckets in the sketch, thus preserving the geometric prior from upstream embedding.
Properties of item sketch –
Customer sketch: Here, a sketch of each item (from customer interaction) is created independently and all sketches are added along the axis.
Properties of customer sketch –
Since we needed an ML framework to transform a mapping from customer space to item space, we used the Feedforward Neural Network Model, which is used as a conditional density estimator. In this case,
Here, the input is a collection of previously ordered items and simultaneously, the task predicts which item(s) will be clicked on or ordered next. This is valid for any customer-item interaction data where we have embedding for items.
In this case, we can compute customer embedding from the model, where the input is customer-sketch and the output is customer embedding.
We get the dot product score with item set sketches and sort the items by score in decreasing order to get the ranking.
= 7680 + 4 = 7684 and Output embedding dimension = 7680
EMDE offers flexibility to combine multiple embeddings captured from different modalities. For example, we can get item embedding based on interactions (click/order/add to fav), item names or item images, etc.
We are currently using EMDE for
The aim remains to make recommendations more accurate and efficient. More on our tech and product innovations and learnings on our blog.
This is a two-part series on improving the recommendation search for our customers. If you are interested to work with us on such innovative problem solving, then connect with Manav Gupta on LinkedIn. We are always looking for enterprising data scientists at Zomato.
This blog was written by Saurabh Gupta, Sonal Garg and Manav Gupta.
-x-
Sources: