Imagine scrolling through your favorite food ordering and delivery app, Zomato. Suddenly, a stunning picture of a dish appears. It’s not just a snapshot; it’s a picture-perfect masterpiece that tells you exactly what you will be getting and makes your taste buds tingle!
At Zomato, we understand the power of good food pictures. It is important for food and beverage pictures displayed on a restaurant partner’s menu to be shot at the right angles, zoom levels, and backgrounds for them to be appealing to the customers. Every day, we receive an incredible 10,000+ food photos from our restaurant partners, many taken directly on their mobile phones. However, not all photos meet the quality standards and we believe that good quality photos are crucial in helping customers decide what to order.
This led us to a question: how can we empower our restaurant partners to capture professional-quality photos without requiring Zedi-level photography skills or expensive equipment?
Enter PicNic AI (Picture Nicely AI), our culinary maestro of image enhancement. We asked ourselves, “Can we help our restaurant partners to benefit from the power of AI by elevating their basic food pictures into something that was shot by a photographer in a studio using a professional camera?” The answer was a resounding YES! We wanted every dish, whether from your cozy neighborhood joint or that five-star restaurant up the hill, to look so irresistible that you would want to devour it on the spot. And that’s where Zomato’s innovative ML team began on a journey to build PicNic AI.
Developing PicNic AI
To bring our vision to life, our team turned to an AI model called Stable Diffusion. Stable Diffusion is a generative AI model that excels at transforming input signals into stunning images. It combines the power of deep learning with advanced Diffusion Generative AI to produce visually appealing and realistic results using a text prompt that describes how the produced image looks. However, to truly showcase the original dish uploaded by the restaurant, we needed the InPainting Variant of Stable Diffusion.
The InPainting Variant of Stable Diffusion allows us to seamlessly integrate the original dish into a professionally enhanced background. It also enables us to further control and guide the image generation process by providing an image mask that instructs the model to alter and generate only certain portions of the image while leaving the other portions untouched.
The model goes beyond merely generating the background; It also completes any cropped portions, fixing zooming inconsistencies, adjusting color balance, and even upscaling the image to higher resolutions.
Guiding the Magic: Image Mask and Text Prompts
The process begins with the original image uploaded by the restaurant partner. We then pass it through our proprietary Zomato Segmentation AI Model, which skillfully detects and labels the food and dish from the background. No need to label various objects or classes here; this model specializes in understanding the generic foreground using a custom Saliency Object Detection model capable of adapting to the vast universe of cuisines, regions and dishes that grace Zomato’s platform.
But what about the artistic flair? That’s where the Text Prompts come into play. These prompts are carefully crafted instructions that describe the desired style, aesthetics, and technical aspects of the background of the dish. By providing this guidance to Stable Diffusion, we ensure that the resulting image is a visual masterpiece that captures the original dish and sets the stage for a delightful food ordering and delivery experience.
Serving Up Success:
Thanks to the artistry of PicNic AI, over 1 lakh images uploaded by our restaurant partners every month have benefited from this. This technology makes professional-quality food pictures accessible for our restaurant partners even in the remotest areas of the country at no additional cost and through more visually appealing images. A win-win for our 3 lakh+ restaurant partners and millions of customers!
The team that made it all possible
We would like to express our heartfelt thanks to folks from our AI and ML team, Catalog tech team, and Content Operations team who collaborated beautifully to make it all possible – Jayesh Gupta, Poonam Thapar, Abhilash Awasthi, Ram Singla, Suraj Rajput, Neel Agarwal, Deepak Deora, Jahnvi Goyal. Special mention to Jayesh Gupta for driving this project with passion and his enthusiasm.
So, embark on this delectable journey with us and let your eyes savor the flavor of enhanced food imagery on Zomato. It’s a visual treat that will leave you craving more. Bon appétit!