Sakib Malik | July 25, 2024 | 6 min read
Go Beyond: Building Performant and Reliable Golang Applications

Imagine this: you’re in a war room atmosphere. Tension hangs thick in the air as your application team and you scramble to debug a sudden spike in OOM issues plaguing your application’s containers. No memory leaks seem to be the culprit, but further investigation reveals a shocking truth: the garbage collector (GC) wasn’t running when it should have been! This crucial cleanup process failed to free up memory before the memory breached the container’s hard limit, triggering those dreaded OOM issues.

Now, let’s look at a different scenario. You’re analyzing your application’s performance profile, and a particular pattern jumps out. A significant chunk of your application’s CPU usage is being devoured by the GC. While your service doesn’t necessarily require a ton of live heap memory, the constant GC activity is eating away at valuable resources. You have a hunch – what if you could streamline the GC process, making it more efficient?These two seemingly unrelated situations hold the key to unlocking a powerful new tool in GoLang: GOMEMLIMIT

The memory maze: Why did we need it?

While GoLang, like many languages, utilizes a garbage collector (GC) for automated memory management, pre-1.19 versions offered limited control, fostering the potential for OOM issues and inefficient GC management in high memory applications. The crux of the issue lay in:

GOGC and the Twice the Trouble: Previously, Go relied on the GOGC environment variable (defaulting to 100) to trigger GC cycles. This initiated garbage collection roughly at twice the live heap size (live heap is that part of the memory that is being used actively by your application and cannot be reclaimed by the garbage collector) as identified in the previous GC cycle, and it didn’t take into account your application’s memory hard limit

The OOM Trap: The problem arose when this doubled live heap size GC target surpassed the application’s memory hard limit, inevitably resulting in OOM issues.

GC CPU wastage: For high memory applications with low live heap size, this doubled live heap size was lower compared to the memory hard limit, causing frequent GC cycle consumption, leading to high GC CPU usage.

The memory guardian: GOMEMLIMIT

GOMEMLIMIT is a godsend for memory-hungry Go applications like ours at Zomato. It acts as a soft memory limit for the heap, gently nudging the garbage collector to work its magic more frequently when memory usage gets close to a defined threshold (GOMEMLIMIT) which is set to be less than the application’s memory hard limit (to prevent OOM issue) and also saving CPU cycles by not running GC unnecessarily. This proactive approach prevents OOM crashes caused by inefficient garbage collection and keeps our applications running smoothly and consistently.

It’s important to remember that GOMEMLIMIT cannot solve memory leaks. A memory leak occurs when your application holds onto some memory (live heap) that the garbage collector cannot reclaim. If the live heap itself exceeds the application’s hard memory limit, even GOMEMLIMIT won’t be able to prevent an OOM issue.

Introducing: Zomato/Go/Runtime Library

In order to implement GOMEMLIMIT for our applications, we took the initiative to develop our in -house Zomato/go/runtime library, simplifying the process for application developers. This library boasts the following functionalities:

Dynamic GOMEMLIMIT Calculation:

This feature eliminates the guesswork by dynamically calculating the optimal GOMEMLIMIT value during runtime. It considers the following factors:

  • ECS Task and Container Constraints: The library analyzes the memory limits set for both the ECS task and individual containers to arrive at a dynamic value for GOMEMLIMIT at runtime
  • Real-time Memory Usage: It retrieves up-to-date information on the current memory usage of the task at fixed intervals, this data is then used to dynamically calculate the optimal GOMEMLIMIT value with our homegrown algorithm using runtime/debug.SetMemoryLimit for setting the limit and the ECS task Metadata Endpoint for retrieving real-time container memory usage and memory limits.

GOMAXPROCS

Setting the GOMAXPROCS environment variable to an optimal value is required to avoid thrashing and to reduce unnecessary CPU throttling, our library achieves this by setting it based on ECS task level CPU hard limit.

Enhanced Monitoring with Runtime Metrics Integration

This library goes beyond dynamic GOMEMLIMIT calculation. It leverages the Go runtime library to export crucial metrics related to:

  • Garbage Collection: Track GC activity to identify potential bottlenecks or inefficiencies
  • Heap: Monitor memory allocation and utilization patterns to understand memory pressure within the application
  • Memory: Gain insights into overall memory usage and potential areas for optimization
  • CPU: Analyze CPU usage patterns and identify potential correlations with GC activity
  • ECS Task Limits: Monitor current memory usage compared to imposed ECS task limits, ensuring adherence to resource constraints

Unveiling the Impact: A deep dive into our memory management success 🚀

We integrated Zomato/go/runtime library in over 250 of our golang microservices , which yielded impressive outcomes:

Reduced GC CPU Usage by more than 95%: GOMEMLIMIT intelligently triggers garbage collection only when necessary, saving valuable CPU cycles for other critical tasks. This resulted in a remarkable reduction of overall CPU usage of our applications by up to 25%-50% (which in turn leads to equivalent reduction in our EC2 compute costs 📈). This substantial improvement empowers our applications to operate with greater efficiency handling higher loads with

increased ease, GC CPU usage dropped significantly, from a considerable 25%-60% (~25% on average) down to under 2% for most of our applications (a decrease of around 90% on average)

Enhanced Stability: By establishing a soft memory limit on when to run the GC, GOMEMLIMIT demonstrably reduced the risk of unexpected OOM ❌ issues and application crashes. This translates to a more reliable and stable user experience for our valued Zomato customers.

Reduced CPU throttling: By adjusting GOMAXPROCS to the optimal value using our library, we have reduced cpu throttling for our applications by up to 50%.

Enhanced connection management: By reducing CPU%, each application requires less number of containers to serve the same number of requests, so each downstream application has to maintain less number of connections to our application, reducing load on our service mesh.

Easy Debugging: Runtime library exports important metrics related to garbage collector, heap, memory, cpu and ecs task limits which help debug issues during incidents and also in setting up alerting on critical metrics

Cost Optimizations: By optimizing CPU usage and reducing the number of required AWS ECS tasks / EC2 instances, GOMEMLIMIT contributed to huge cost savings of around 30,000 USD per month.

The takeaway: Empowering developers

At Zomato, we’re constantly pushing the boundaries to enhance our platform’s stability and efficiency. Our latest innovation, GOMEMLIMIT, is a testament to our commitment to delivering seamless experiences through optimized performance.

GOMEMLIMIT empowers our developers with unparalleled control over memory and CPU usage. By significantly reducing out-of-memory (OOM) errors and optimizing garbage collection, this powerful feature not only improves application performance but also delivers substantial cost savings. It has proven invaluable in enhancing the scalability and reliability of our applications, ensuring that Zomato remains at the forefront of technological innovation.

As part of our ongoing commitment to innovation, we are excited to announce our plans to implement Profile-Guided Optimization (PGO) in GoLang, starting with the upcoming Go 1.20 release. PGO utilizes real-world profiling data to guide the compiler in making more informed optimization decisions, promising to further enhance runtime performance by 2-14%. This initiative underscores our dedication to leveraging cutting-edge technologies to drive continuous improvement across our platform.

Join us in shaping the future of technology

At Zomato, we believe in creating an environment where innovation thrives and where every team member contributes to our success. Our commitment to excellence is reflected in initiatives like GOMEMLIMIT and our upcoming implementation of PGO. If you are passionate about technology and seek to work with a team that values innovation and impact, consider joining us on our journey to redefine the future of technology in the food industry and beyond. Reach out to us at techrecruitment@zomato.com to explore exciting career opportunities.

All content provided in this blog is for informational and educational purposes only. It is not professional advice, and should not be treated as such.

This blog was authored by Sakib Malik in collaboration with Saurabh Sabharwal and Aniket Suri under the guidance of Himanshu Rathore.

facebooklinkedintwitter

More for you to read

Technology

zomatos-journey-to-seamless-ios-code-sharing-and-distribution
Inder Deep Singh | June 6, 2024 | 15 min read
Unlocking Innovation: Zomato’s journey to seamless iOS code sharing & distribution with Swift Package Manager

Read more to know about how we migrated Zomato’s 10-year-old iOS codebase to Apple’s Swift Package Manager. The blog highlights the process behind building a central syncing mechanism needed across multiple apps, the challenges encountered, and how we resolved them.

Technology

menu-score-how-we-cracked-the-menu-code-and-you-can-too
Keshav Lohia | April 18, 2024 | 2 min read
Menu Score: How we cracked the Menu Code (and you can too!)

Menu score empowers our restaurant partners to build exceptional and delightful food ordering menus. Read on to know more about how this simple tool helps restaurant partners grow their business.

Technology

hackoween-elevating-cybersecurity-resilience-at-zomato-through-competitive-challenges
Security Team | April 16, 2024 | 6 min read
Hackoween: Elevating cybersecurity resilience at Zomato through competitive challenges

Read more about how we are strengthening cybersecurity and promoting a security-first approach for our engineering team.

Technology

a-tale-of-scale-behind-the-scenes-at-zomato-tech-for-nye-2023
Zomato Engineering | February 29, 2024 | 6 min read
A Tale of Scale: Behind the Scenes at Zomato Tech for NYE 2023

A deep-dive into how Zomato handled the massive order volumes on New Year’s Eve. More than 3 million orders delivered in a single day!