WooCommerce supporting 60k product updates per hour

One of the common setups with e-commerce is you have both channels—physical shop and online site, and they usually share the same inventory. This means every time there’s a sale in either channel, the inventory needs to be updated.

For a particular merchant that I helped, this meant around 60,000 product updates every hour to sync the inventory between the site and the physical shop.

Now, for modern-day technologies, this volume isn’t much, but considering:

  • They were running on just one server. It was a powerful server, yes, but requests taking long time means customer traffic is starved.
  • Product update requests would generally invalidate caches all across the site, including shop pages, product pages, category counts. These are the most visited pages, and invalidating them this often isn’t great.
  • Updates also trigger a lot of different downstream things, like calculating recommended products, packages and so on.

I was called in when they wanted to do batches every 2-3 hours, and each batch would take 40 minutes to complete. This was clearly not sustainable.

To debug any performance issue, USE is my go-to method. You would list out resources, like CPU, RAM, disk I/O, network I/O, PHP workers and so on. Then for each of them you would check whether they are under saturation, are there errors etc.

While this usually works great, it wasn’t a good fit this time. I could see that exhaustion was at the PHP worker/memory level, but throwing money at the server was not an option, and it would not have been a good fit anyway. There’s a limit to how many performance issues you can fix before the cost-to-benefit ratio would start skyrocketing.

It was time to go inside the application. Thankfully, it was WooCommerce, which I have been working on for a few years at that point as a core dev. It also meant that there would be escape hatches and hooks in many places that I could use.

In cases like these, there’s an urge to look through the code, see the data flow and measure what functions or methods are taking the most time. And generally, improving individual methods and functions to be more performant are great long-term fixes. For instance, you would run the request with performance monitoring enabled to find slow spots, and try to optimize, add cache and so on.

And we absolutely should be optimizing functions for sustainable long-term improvements. But when you are onboarding a client having issues, and management wants fixes fast, it’s also worthwhile to look at the system as the sum of its components.

In this case, when a product gets updated, whole bunch of different things would happen:

  1. Categories counts would recalculate to exclude products no longer in stock, or when they don’t have that same category. This was a very expensive operation for the shop that had as many SKUs as they did.
  2. All caches related to the product would get invalidated.
  3. Webhooks would get scheduled for the ERP and other systems.

Any APM worth it’s salt would be able to print traces that can tell relative time spent per component.

The Breakthrough: Once we started looking at the system via components, it became clear that there are a lot of operations that can be punted till after the batch has completed processing. Page caches can stay valid unless the product goes out of stock; however, product caches must be invalidated along with the update operation.

As an immediate fix, we moved out the category count calculation and selective cache invalidations. This reduced the time it takes to process the batch from 40 minutes to 7-8 minutes. While still slow, it updated the system to a sustainable state.

This created some immediate releif. What had been a crisis-level bottleneck was now manageable. The merchant could finally keep up with their inventory sync requirements without falling behind. More importantly, customer-facing pages were no longer being starved of resources, shoppers were getting faster response times, and the business could operate without the constant fear of system collapse.

At StarShop.Dev, we are now tackling this and similar issues right from the get go. The infra changes that can’t be suggested at that time, are now built into the platform. If you are facing similar issues or unsure about what sort of specs you need to handle the load, book a slot with us!