Choosing the right size server package—and choosing a package that can be scaled easily—are important decisions in any hosting purchase. Simply buying a server with enough CPU, RAM, I/O and disk space may not be enough for customers anticipating future growth or spikes in traffic. And upping the size and cost of a server package during (or even before) a traffic spike may not always be the smartest use of a company’s money and time.
A website on a fast server on a fast network is going to be fast until the server runs out of something: CPU, RAM or I/O, or something at the software resource level like inodes. The places within your hosting infrastructure where resources are depleted first are your “bottlenecks.”
But server specifications don’t cause the bottleneck. They are simply the place where a theoretical limit collides with a real-world application. The following is a list of the five areas of interaction that can lead to a slow-down in service:
Network capacity should be investigated before signing up for a hosting plan. Speed tests, network infrastructure specifications and consumer recommendations are great ways to learn about the network capacity of a web host.
Again, researching the plans that exceed your current needs—but could accommodate possible future needs—is an important step in choosing a host. A company that offers a variety of hosting plans as well as simple scaling between packages can help combat a bottleneck when (or before!) it arrives.
Traffic to the site
No one wants to intentionally limit traffic to a commercial site so there’s nothing to do about this variable but prepare for its rise.
Hosting customers are often unaware of the instability of some third-party software. What works well when hosted on a few sites with minimal traffic may bog a server down when scaled up. And customers should not rely on their web designers to choose wisely. Research the programs you are looking to install on your server.
Page design and custom coding of the site
Here is where the proverbial rubber meets the road. Webpage coding often has unintended consequences. What worked well on—and was designed for—a site with a few hundred hits per week, might perform very poorly (read: become a resource hog) as traffic rises.
This is not to say that site design and coding are always the cause of trouble. It’s the combination of user traffic and site design (coding and installed programs) that cause a bottleneck in a specific area of a server.
In theory, knowing what resource your site will run out of first as it grows in popularity, as well as the underlying cause of that resource usage, would be incredibly useful. The problem is, it’s very tough to do in reality.
Here’s an example: a while back we had a customer who called us in advance of a major event he was organizing. The company was releasing tickets to a series of shows they were putting together, and they knew that this online event was going to draw a major crowd. The customer asked to be upgraded to a larger server and we complied. We actually did more than comply—we did a needs assessment for him and chose the larger server by assessing the server load and scaling it up to meet the expected draw of the event. The company typically had around 100 people on their site simultaneously at any given time. They expected that to go to 30,000 or so. So we did some math for them and got them on to a server that should have more than compensated for a 300-fold traffic increase.
When the event happened, the server fell over—it was running out of CPU. For the 2-4 minutes it took for our team to identify and fix the problem, it was so slow as to be practically unreachable. Our estimates had been wrong.
Here’s what we quickly discovered:
- The site was using a centralized image repository full of large images, and on each page they had a couple dozen pictures that were using a script to autoscale them down to the size they wanted them to be on the page.
- At around 100 simultaneous viewers, this script wasn’t putting enough of a load on the server to be identified as a problem, so we didn’t catch it in our initial assessment.
- 30,000 simultaneous viewers were indeed hitting the site, but their usage patterns were changing. Instead of casually going from page to page, all the site’s users were frantically hitting refresh over and over again in hopes of getting tickets.
- As a result, this image resizing script—being run on each page on average 15 times for 30,000 users—was therefore being asked to be run on the server 450,000 times per second.
As it turned out, a server 4 times as powerful (if such a server existed) would have risked failing under that load too.
We were as quick as we could be—we were able to identify the problem and we temporarily disabled the script (making some images on the site not appear for a short while) and had the site up and running in time for their entire event run to sell out within 20 minutes. But that script wasn’t caught as a problem early on—not before it became a problem—and it was what directly led to the site’s CPU bottleneck.
There are ways to do artificial load tests on servers, by using apps like Apache Bench. It’s helpful in some cases. But the simple truth is that many hidden problems require real-world use tests that are hard to replicate. This is tough stuff—but people should keep it in mind when they think about the speed of their site.
There often comes a point in troubleshooting a slow server when the support technician will recommend a hardware upgrade. The question is: do you trust the advice your web host has been giving you? At ServInt, we troubleshoot slow sites everyday. Our customers have come to expect and rely on our expertise when it comes to diagnosing server problems.
You have to have complete confidence that your web host has your best interests at heart—and not its own bottom line—when they suggest an upgrade. And if you do not, it’s time to look elsewhere for hosting.
At ServInt, a lot of our support tickets come from helping people weather their successes; as a customer’s traffic grows, their sites can slow down. Sometimes this happens because a piece of code is sucking up resources. Sometimes things are pretty efficient and it’s simply time for an upgrade. If you can’t load test your site to know what to expect when the unexpected hits, make sure you’re with a web host with a reputation for being able to quickly identify and fix these problems when the unexpected occurs.
Photo by polarjez.