Ever wondered what all goes into calculating the ETA (Estimated Time of Arrival) of your food order? From the time it takes a restaurant partner to prepare your food order to the time it takes a delivery partner to reach your location from the restaurant – the whole picture is much bigger.
Here is an example –
Suppose you order a sundae from an ice cream parlour 5 kilometres away from home. You note that it reaches you within 25 minutes.
On the other hand, when you order biryani from a restaurant merely 0.5 kilometres away, you note it takes more than 45 mins. Confusing right? Now imagine predicting the optimal FPT for the huge number of dishes that exist on Zomato and the lakhs of restaurants delivering them, each using their own preparation time.
Feels like quite a challenge, right?
Well, this is one of the significant problems our Data Science team solves – to predict accurate Food Preparation Time for both customers and restaurant partners while communicating to delivery partners the right time to reach a restaurant.
In our earlier blog – The Deep Tech Behind Estimating Food Preparation Time – we described what food preparation time means at Zomato and the factors it depends upon. We spoke at length about how the data science model makes predictions.
For people who haven’t read the last blog, we highly recommend you give it a read to understand this continued conversation better. Here, we will be sharing the improvements and learnings made on top of the previous model.
Restaurants are operationally-driven systems with multiple dynamics at play. It means that orders with the exact items and quantity can have different food preparation times, depending on the on-ground situation in the kitchens.
The above curve closely resembles a right-skewed distribution. One can see that the variation of chole bhature is considerable – different outlets take different times, and even the same restaurant can take more or less time depending on the rush hours, availability of staff, etc. We aim to build a model which can accurately predict the FPT for such a distribution.
Training our model with loss functions like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) has its limitations:
For our use case, we need an accurate model with fewer underestimations ensuring precise and transparent communication with customers.
Hence, we work with alternative loss functions. Here is a brief about the Quantile Loss Function,
For q < 0.5, our loss function would start penalising the positive error of actual minus predicted more than the negative error making the model predict lower values for Y and vice-versa for q > 0.5. In a special case where q = 0.5, our loss function becomes the same as the MAE loss function, penalising positive and negative errors equally.
Further, we can modify this quantile loss function in the following manner, which yields us the benefits of MAE, and at the same time, adds appropriate penalties to the positive and negative errors.
Here, the variables m and n act as a penalty factor for positive and negative errors, respectively. These can be tweaked in such a way that it yields a model optimised to work on corresponding business metrics.
The previous model could only predict the FPT of one order. Over time, we started integrating FPT with multiple downstream processes in our system, which in turn, increased the importance of FPT predictions multifold.
While predicting Food Preparation Time (FPT), we also need to predict other values, such as
Predicting targets like these requires overlapping information which is essentially what the core FPT model is. Hence, such targets can become a part of the FPT model itself. All of this together helps us improve our downstream algorithms so that customers get their food within the expected time range.
Let’s elaborate on how we implemented the above-use-cases into our neural network(s).
Initially, we had a separate model for each use case, each requiring independent training. At a high level, our deep neural network model looked as below –
A single TensorFlow model of the given architecture consists of ~6 million parameters which take up to 15 hours of training, not to mention the size of resources required to make a real-time prediction with this architecture.
Training multiple models with this architecture – each having 6 million parameters – makes it very impractical for us to re-train and maintain these models. We needed to make some optimisations to make it more practical to use.
Since all these models have the same type of information being fed to them, we introduced a multiple-output layer to the model. With its help, we could make multiple predictions with one model itself.
The multiple output layer consists of a compilation of multiple fully connected layers, each layer dedicated to making predictions for a dedicated loss function.
In the earlier version, to create an embedding of items present in an order, we used to take the weight of the item vector (created using word2vec on item names) based on their quantities and cost.
Now, we have updated the order item encoding by using a bi-directional LSTM-based sub-network which takes the word2vec features along with cost, quantity, and historic FPT statistics to create embedding for an order.
Shown below are the embeddings of all orders of restaurant X serving multiple cuisines, i.e., North Indian, Chaat, Chinese, desserts etc. (Red) and restaurant Y serving only ice creams and desserts (Blue) plotted on a 2D plane using Principal Component Analysis (PCA).
We can see that in the previous methodology, the major density of orders from either of the restaurants is concentrated at one point. In the new model using the Bi-LSTM architecture, there is a clear separation between the orders from the two restaurants (embedding spread over a larger space, giving clearer information).
However, an observant reader might notice some overlap between the order of X and Y, which is because of the similar nature of items being ordered, such as brownies, sundaes etc.
While working on the FPT model, we added many complexities to the model, be it adding new features, training deeper models, or creating more embeddings. Upon continuously increasing the complexity, we ended up increasing the model size resulting in relatively slower predictions, longer training time, more resource utilisation, etc.
To solve these challenges, we made the below changes which controlled these issues to a large extent –
You can train your model on any loss function, be it MAE, RMSE, or MSE… the list goes on and on. All these functions may be necessary to train your model because of their convex nature, each having its own pros and cons. However, all this maths and computation would be of no use if the function cannot train a model with some business use.
Hence, it is essential to devise business metrics which can measure the relevance of different iterations of the FPT model. We shall discuss two of these metrics in particular here; these are –
To ensure that the customer gets the food within the promised time, we want to maximise accuracy and minimise the breaches. At the same time, we don’t want to overestimate time as it might push the customer not to order food from our platform.
Now that we are familiar with the same terms, or as we say, now that we are on the same page, let’s talk about numbers. Using the above-mentioned strategies, we saw an improvement of 4% in 3-min accuracy and a reduction of 9% in 3-min breaches over our baseline model.
FPT estimation is so crucial to Zomato that we always strive to improve the predictions. While we continue to experiment with architectural changes which make the model larger and more complex, we also seek to develop a better understanding of restaurant processes on the ground. Some of the opportunity areas where we are actively working are
This is a follow-up article on how we use machine learning to calculate Food Preparation Time. If you are interested in working with us on such innovative problem solving, then connect with Manav Gupta / Akshay Jain on LinkedIn. We’re always looking for Data Scientists (aka Chief Statistics Officers) at Zomato.
This blog was written by Parth Javiya, Abhilash Awasthi, and Akshay Jain.
____
All content provided in this blog is for informational and educational purposes only. It is not professional advice, and should be treated as such. The writer of this blog makes no representations as to the accuracy or completeness of any content or information contained here or found by following any link on this blog.
All images are designed in-house.
-x-