Have you ever wondered why we, as humans, have always gravitated towards things which can be done instantly? Why do instant noodles, instant coffee or ready-to-eat packed food find a place in our kitchens?
We as a species simply hate to wait. We believe the same goes for food. A couple of questions that always crop up in our minds are ‘Where is my food?’ or ‘When will my food arrive?’
For Zomato, from the moment a customer opens the app and until their food arrives at their doorstep, it is important for us to provide accurate information on when their food will be delivered. Giving a higher than actual time estimate can deter customers from ordering as does estimating lower than actual delivery time, which can then increase inflow to our customer support.
Hence an accurate time estimation not only results in better customer experience but can also reduce the burden on our customer support teams.
As showcased above, in the food delivery ecosystem, multiple handshakes happen once a customer places an order.
Each of these steps has a time component associated with it, i.e., the time it will take for the restaurant to prepare the food (Food Preparation Time, FPT), the time it will take for our Delivery Partner (DP) to reach the restaurant (DP pick up time), and the time it will take for our DP to reach the customer’s address (DP drop time).
All of these, plus a few other time components (predictable and unpredictable) play out together to finally compute the time from order placement to final delivery that is then showcased to our customers.
Zomato’s online food delivery platform has restaurants across 500+ cities with 400+ cuisines in India, which clearly tells us that one can find diversity in every nook and corner of this nation.
The scale of business demands better FPT prediction, which in turn helps in better delivery time, better allocation of DPs in order assignment and efficient delivery of orders. It also helps us to engage better with our Restaurant Partners for monitoring FPT breaches and compliance.
There are multiple factors that affect the FPT for a particular dish. Say a customer orders Chicken Biryani (D1) from two restaurants (R1 and R2) –
All other scenarios being the same, one would expect FPT of Chicken Biryani from R1 to be less than that of R2 since –
But there might be certain additional factors at play here –
That’s a lot of components to keep in mind!
Given the nature of the problem, we divided it into two major components –
We notice that the Item level information is usually in text format. In order to use text information in machine learning models, the most common methods are Bag-of-Words, Tf-Idf or Word2Vec Embedding. The first two methods fail at our scale because they encode the information in a One Hot Encoding (this is a method where data is converted into forms that help better prediction). Given that the distinct number of dishes on our platform is ~3.5m, this would have resulted in millions of columns being added to our data. The same reason stands for Tf-Idf.
We discarded those two approaches because –
For us, Word2Vec embedding became the preferred choice because –
The above image is a visualisation and subsequent clustering of menu item vectors trained using Word2Vec. One can see how different clusters are being formed. For example, all types of Biryanis are together, but are far off from Milkshakes, which is expected as they are fundamentally different dishes.
An order seldom contains only one item. In such a scenario, we take the quantity and cost weighted average of item vector to get to the final menu representation. Shown below –
Let’s take an Order for example containing N items.
Final order representation is a weighted average of the cost of each item and the quantity ordered.
Given that in a month, food is ordered from about 150k+ restaurants, understanding how a restaurant could be represented numerically for a machine learning model, becomes the most essential part of this puzzle.
In our case, a restaurant is represented by categorical data. Categorical data is very common in business datasets. For example, users are typically described by country, gender, age groups, etc. Products are often described by product type, manufacturer, seller etc.
The most used category representations are One Hot Encoding, Encoding Categories with Dataset Statistics, or Encoding Categories as Cluster labels.
Categorical data is extremely convenient for comprehension but very hard for most machine learning algorithms, due to these reasons –
The basic premise is, we let a neural network calculate the best representation of a restaurant by itself. Entity embedding is a vector (a list of real numbers) representation of an entity, which is a restaurant in this case.
The above image is a T-SNE plot (commonly used to visualise high dimensional data) of the most ordered from restaurants in Bangalore, where restaurants serving similar cuisines and dishes are clubbed together.
X = {Current Order Level Information, Order Vector, Restaurant Vector}
Y = Food Preparation Time
We initialise an embedding matrix representing each restaurant with ‘m’ dimensions. Each column of the embedding matrix represents one restaurant. Then using various features related to an order, the X-Vector is passed through a neural network. Through backpropagation, the restaurant representations get updated with each iteration along with the weights.
Read more for information on Categorical Embedding.
XGBoost
Through the embedding matrix, we get the final restaurant representation and then we pass the same X-Vector, as in the entity embedding architecture, to an XGBoost Regressor Model.
Deep Learning Architecture
Our previous model architecture couldn’t take into account the previous sequences of orders, which came to the restaurant; both the completed orders as well the current running orders.
One expects that if in ‘previously completed orders’ there was an order of Butter Chicken, then subsequently predicting FPT of a Butter Chicken order should be nearabout the past value. Passing information sequentially will better understand the kitchen capacity and behaviour at time t. FPT of a restaurant could also be understood as a time series with its various amplitudes of the series (denoting FPT of the order) depending on the item being cooked. Hence, we narrowed down to using a sequential architecture to better represent a restaurant’s kitchen.
Both running orders (running orders at time T, at max 5 running orders) and completed orders (last 5 completed orders) are passed through a stacked LSTM Layer. The resulting column vector is concatenated with the present order features and the Restaurant Embedding Vector.
The resulting column vector is passed through a 2 layer dense network and regressed on FPT.
Through this, we were able to reduce our mean absolute error from 4.64 mins to 4.13 mins and mean squared error from 32 to 28.
In addition to the encoding of data across restaurants and dishes, we were further able to enhance the model with a restaurant level information input of preparation time.
Previously, we used to calculate FPT as the difference between the restaurant accepted order timestamp and DP order pick up timestamp. This didn’t result in true FPT as the behaviour of a particular DP during order pick up became a part of the equation. This ideally shouldn’t be the case as FPT is a restaurant phenomenon. In order to correct this, we introduced a Food Order Ready (FOR) button in the Restaurant Partner app.
They can now mark this whenever the food items are prepared and are ready for pick up. In our initial results, we saw a 9 percent improvement within 5 minutes accuracy for our prediction. As the compliance of FOR increases, our prediction results become even more accurate.
We are also moving towards the newest and most exciting paradigm in the world of data science – Reinforcement Learning, i.e., a self-learning system, which updates weights as per real-time errors observed at a restaurant level.
Given that food preparation time represents real-time behaviour, making such a system will be a more elegant solution for this problem statement, ensuring a smoother order tracking experience for our customers.