Editor’s Note: This week, we continue our guest series with Larry Ullman’s second installment in his discussion about scaling a website. If you’re just joining us, be sure to check out Part 1: Infrastructure that Scales.
As a web developer, writer, and public speaker, I often interact with people of various skill levels, talents, and interests. This is one of the joys of my career: I’m fortunate enough to bear witness to the thoughts and experiences of other programmers, “idea” people, and just plain dreamers.
One of the common topics that comes up, or that I am directly asked about, is how one “goes big” with a web site. In this three-part series, I explain everything I believe and know (or think I know) when it comes to this subject. In Part 1, I covered the myths of going big, what infrastructure you’ll eventually need, and how one should start developing a new project. In this, Part 2, I discuss how one writes code that can scale well should your site “go big”. Finally, in Part 3, I’ll turn to designing databases that can handle the traffic of a “big” site.
If I were to summarize how to program a site for a potentially “big” scale, the summary would come down to two words: optimal and flexible.
Writing Optimal Code
The term “optimal” means that the site should perform as well as possible. This should seem obvious, of course, but it bears repeating. Entire books could be written on coding optimally, but a few rules of thumb include:
- Don’t create variables or functions you don’t need
- Minimize interactions with the file system
- Minimize interactions with the database
- Minimize networking
- Limit the amount of information retrieved from the database
- Watch usage of loops and recursion
- Minimize and compress the amount of data transferred to the browser
A catch is that you need to avoid premature optimization. Don’t overthink what you’re supposed to be doing while you’re performing the initial programming! The first goal is really to write code using these best practices. I’ll repeat that, because it’s important:
The first goal is to write code using best practices.
Then, when the site is done, you can profile and benchmark what you’ve created to see how it performs. After doing all of that, you can finally attempt to fix bottlenecks.
How Code Can Be Flexible
Next, there’s the issue of writing flexible code. To understand the importance of flexible code as it pertains to “going big”, one has to understand the two ways in which a site may need to grow:
- Scale to handle more traffic
- Add and change features and functionality
A site can often be scaled to handle more traffic simply by throwing more resources at it: better hosting, a better server, more memory, more servers, a Content Delivery Network (CDN), etc. You generally don’t have to do anything that unusual or special with your code to make it scale-able. The above best practices will truly be sufficient up to the point of having one of the Top 100 active sites in the world. That being said, inefficient code can make it necessary to spend more on resources sooner than you otherwise should. But assuming you’ve stuck to best practices, it’s this second issue–the ability to add and change features and functionality–that’s more relevant to your programming and the ability for a site to “go big”.
Some years ago I saw a presentation by an engineer at YouTube who said that it was more important that they could fix problems quickly than that the site itself runs quickly. I’ve heard similar thoughts expressed by many other programmers: a developer’s time is precious, but a site can be made to run faster merely by throwing an extra $100 worth of RAM into a server. Taken a step further, Paul Graham expertly stated the arguments against designing for scale. Not that being indifferent to the performance of your site is acceptable, it’s just that you don’t want to adopt an approach of optimum performance at all costs. That route often leads to a site that will scale well, but will never be tested because the project will have failed well before that point.
The point of flexible code is that the site has to be written in such a way that it’s quick and easy for you to fix problems, add features, and make other significant changes. If the code is hard to update, then updates either won’t happen or won’t happen quickly enough (and therefore, users will leave).
As an example, I recently worked on a project for a customer whose site had some terrible legacy code. The site worked well enough, but the customer wanted to add features and improve performance. The performance issues were largely remedied by fixing several problems in the database. However, each added feature took twice as long to implement because the code was a mess. One of the site’s main administrative scripts was over 3,000 lines long! And because the site is being actively used (and is currently profitable), I could not make the large, sweeping changes that I would have liked to. Hence, on that project, baby steps were required, changes were made very slowly, and the client paid a lot extra because it was so tedious to work with the code.
So how do you write flexible code? I’ll answer that specific question next, but first, there’s a conflict to be understood: flexible code is at odds with optimal code. Generally speaking, if you make your code more flexible, it won’t perform as well.
As a specific example, many developers use database-agnostic interfaces, such as PDO in PHP. PDO gives you a consistent interface regardless of the database in use. PDO also allows you to switch database applications on the fly, should that need arise. But the added interface layer drags down performance. Moreover, by being database agnostic, you can’t write your code to take advantage of features and capabilities of any specific database application. You’ve increased flexibility greatly, but hurt code optimization.
You have to find a balance between writing optimal code and writing flexible code. Fortunately, most of the best practices you’ll see online and read about in books hit this mark pretty well.
Writing Flexible Code
What, exactly, is flexible code and how does one write it? There are two ways you can write flexible code:
- In the code itself
- Using a good workflow
One of the first hallmarks of flexible code is modularization: dividing code up into discrete, independent blocks. At the more beginner level, using multiple files is a modular approach. One file might connect to the database, another is a configuration file, another has useful functions. The primary files then include these as necessary.
Defining your own functions for certain routines is also a modular approach. With both of these examples, the benefit is that you can edit one piece of code in one location and have site-wide impacts. That’s flexible code.
Second, create variables or constants for values repeatedly used throughout a site. This could be the site’s name or URL, an email address, and so on. You might not think you’ll ever change a site’s URL, but if you do, having assigned that value to a variable that’s used throughout the site will make that change a snap.
Third, and on the more advanced level, Object-Oriented Programming creates much more flexible code. It’s modular by definition. Sweeping changes can be made in single locations, including adding features. I’m not an “OOP is always better” kind of developer, but I don’t think there’s a question that OOP makes for more maintainable code when it comes to large sites. Well, good OOP, that is.
The final component of your workflow should be version control. There are lots of arguments for version control, and being able to quickly institute code changes is just one. Being able to immediately revoke bad code changes is another! For large sites, the more advanced solution is continuous integration.
Again, much of what I’m talking about here is just a case of best practices: the kinds of things you should do whether or not your site would ever go big. The question is really how far you take any single approach.
Those are my thoughts on programming a site in such as way that it can scale. “Scale”, to me, means both being able to handle more traffic, and being adaptable to new features and needs without too much effort. In the third and final part of this series, I’ll talk more specifically about designing a database that can be used on a site that “goes big”.
Photo by Thomas Leuthard