29th May 2020

RFO for recent performance issues

Impact: We have been investigating an issue where sites are intermittently seeing 400 errors when trying to load assets within a site such as images, CSS or JavaScript. This was found to be caused by a faulty update within the third-party software, LiteSpeed ADC, powering our caching layer. Additionally, we encountered an unrelated issue with a small number of sites on our database cluster using a larger than normal number of long-running database queries, in turn leading to database performance issues more widely across Onyx.

Remedial Steps: For the LiteSpeed issues, we rolled back to an earlier version to resolve the issues whilst we work with LiteSpeed to resolve this issue in the latest version of the software. The database issues have been tracked down to a faulty plugin on the sites concerned, and we have worked with the clients to remove this and find alternatives whilst the plugin vendor patches the issue. Additionally, we have taken the opportunity to boost database capacity by bringing online additional clusters, and put the sites with the busiest queries on their own cluster to minimise the risk of this recurring. Our team are also developing additional layers of caching to enhance object and query performance.