Since Zomato’s inception, our users have played an important role in helping people make informed decisions on what and where to eat.
We now house reviews and photos for about a million listings on our platform and as a team, are constantly working to make this content easily creatable and consumable by all.
To know more about how we made reviewing seamless for everyone, give our post on Reviews 2.0 (Part 1) a read. Overall, the response has been overwhelming and we are witnessing a 35% – 40% increase in reviews being posted month on month.
Isn’t that amazing?
Below are some essential details of the product we wanted to build –
1. For the reviewer – We wanted tags to be showcased to a reviewer at the time of review creation
2. For the reader – We wanted to showcase what each restaurant is known for or as we call this section – “Read what people are talking about”
But before we jump into the ‘what’ and ‘how’ of our implementation, let’s have a look at the overall architecture of Reviews 2.0.
Our first step was to aggregate, what you can call a “knowledge bank” – a corpus of all tags one could use in the dining and restaurant industry. From “comfortable seating” to “prompt service”, our corpus incorporates a wide variety of tags. This is shown to reviewers when they wish to point out what they liked / didn’t like about a particular restaurant. We named it Z-Tag Corpus.
At the same time, we also wanted to design a system (an engine to process our reviews, to be specific), which provides insights into each restaurant by analyzing their reviews, taking into consideration the same corpus.
The implementation aimed to summarize millions of reviews into broader tags, reflecting the mood and sentiment of our users.
We used the good ol’ Python for all our NLP needs and data aggregation. The following libraries were ready at our disposal and were quite useful in tag creation –
“awesome food”, “incredible service”, and “pathetic staff” were distinctly classified as having a positive or negative sentiment. Hence, NLTKs inbuilt libraries served as secondary, or one can say fallback classifiers.
Remarks like “courteous staff” or “mouth watering food” are positive sentiments, while “pathetic service” is negative.
However, certain mentions like “long waiting time..” or “..the food portions were less..”, have a neutral sentiment in a general sense, but in the restaurants or dining domain, they become contextually positive/negative sentiments.
In a restaurant review, users share their experience about several aspects of the visit — food, ambience, service, etc. As the number of reviews is not limited and each one mentions a different viewpoint, grasping the overall sense of these viewpoints from hundreds of reviews is cumbersome and time-consuming. We wanted to devise a way to make this decision making process faster.
To answer this, we designed the section — “Read what people are talking about”
The approach we followed to get this right –
We mine opinions from reviews inspired by ABSA (Aspect Based Sentiment Analysis), which predicts the corresponding sentiment of an extracted aspect mentioned in the text documents.
Consider this sentence for example — “The Chicken Whopper at Burger King was amazing but the service was slow”. In this sentence, even though the overall sentiment is mixed, it clearly mentions two different aspects “Chicken Whopper” and “service” in positive and negative connotations respectively.
In a syntactic grammar-based approach, a set of grammar rules are applied to the dataset to extract aspects. A syntactic grammar is defined with a clause and corresponding chunking rule, for example the VBG_DESCRIBING_NN_VV clause defines the following syntactic pattern:
This clause chunks the sentence when a verb (VB) describes the opinion on a target. For example, in the sentence “The place was awesome,” the verb awesome is describing the opinion on target place.
How ABSA works –
When syntactic rule produces expected chunks: The snippet above shows chunking by the syntactic grammar clause- VBG_DESCRIBING_NN_VV. A relation extractor processes the chunked list of trees for the relationship between entities in the sentence. Though the syntactic approach is effective in parsing, it often suffers from noisy extraction.
As the coverage rules are increased, it eventually results in overlapping rules and hence noisy extraction. It is clear from the snippet below, that the same rule also interferes with a different sentence. Since the chunked part in the diagram below has no aspect opinion, it results in a noisy extraction.
When syntactic rule produces unexpected chunks: The particular syntactic rule was not expected to parse this sentence. It not only results in incorrect extraction but also blocks the other syntactic rule, which has resulted in precise extraction.
To address this problem, we proposed a hybrid of rule-based and machine learning models. The rule based model would extract the aspects and their opinion words, while the machine learning model learns the effectiveness of these rules with different sentence structures for a given corpus.
For training the model, a dataset is prepared with the sentence and aspect polarity extracted from each rule. A multi-label classifier is trained for syntactic rule prediction followed by relation extraction. This classifier allows us to select the suitable syntactic rule for parsing as the first step and reduces the noise extraction from other ineffective rules.
The selection process is fully automated owing to our multi-label classifier.
For an extracted entity to reach its final disposal in the product form, it must adhere to certain guidelines and business requirements in order to handle a varied set of audiences –
• Quality entity extraction
• Contextual diversity
• Personalization for maximum utility
Bringing such a model to production comes with key challenges –
We went ahead with an approach to find the frequency distribution of the words in our own corpus and selected the parent topic out of secondary ranked items using this frequency ranking.
Our work on Reviews 2.0 opens doors for various relation extraction methods to be incorporated in order to minimize the noise. In an implicit aspect, there is an indirect indication of opinionated aspect. In the sentence “The place is quite expensive”, there is no clear mention of an aspect price, but “expensive” indirectly indicates towards it. Currently, our system works only with explicit aspects. The immediate extension of this, can be to incorporate the implicit aspects as well.
Additional efficient syntactic rules and relation extractors can also be included to enhance the process further. Post our launch, as a final step, we deployed web apps on a new subdomain behind an ELB and automated the entire production process using Ansible.
Have a look at our end product, it will surely blow your mind!
“The chicken biryani served here is awesome. Really loved the vibrant decor. Their cheese nachos are worth dying for. I will definitely visit this place again.”
Happy reviewing!