Failing From Your Own Success#

While I’m on the subject of load testing, this is the horror story I hear over and over again from developers: The site is ready, it looks great, the client loves it. But you’ve focused all your energy on features and on getting the thing out the door, and almost no energy thinking about how it’s actually going to run under load. So almost invariably, it’s not until ship date, when your investor or whoever’s paying for the site sends the link to everybody he knows, and says “Look guys, I’ve succeeded!” that the site finally runs under load. And it’s a disaster. It’s like one developer I know who built and tested a site completely internally, and it wasn’t until the day it went live that he realized that the average page size was 1.5MB. Or that no one actually tested the transactional integrity of the web pages with the database, and the first time two people try to enter data at the same time, bad things happen.


So this is an important question to ponder: What does it look like when your site fails under load? It’s not like there’s an explosion, or a sign that pops up out of the server that says “Help!” Failure doesn’t look like anything in particular. It’s an inelegant thing, and it’s an inconsistent thing. You see all kinds of bizarre messaging, usually for stuff that’s unrepeatable. You get errors in code that doesn’t actually have errors (no wonder you can’t reproduce them!). It’s just that the environment you’re living in under load is so different. People can waste a lot of time pursuing those errors. You can make yourself completely crazy chasing down these phantom “bugs,” not recognizing that they’re just symptoms of an overloaded web server.


Of course, even a great load tester can’t really show you what failure looks like. You get a report showing how long it took a page to render, but what does that actually look like? What pieces came up and what didn’t? What did the error message look like? Those kinds of things are very tough for a load tester to capture.


That’s why Sean Wilson, our QA manager at Strangeloop, recommends that even when you’re running a load test, you actually go in and use the site yourself during the test. You may be the only “real” person on the site, but you’re experiencing it as if there’s another 10,000 people using it. It’s worthwhile to capture that human viewpoint. The impact of a 120-second response time is nowhere near as significant when your Spirent Avalanche reports it as it is when you’re actually sitting there watching the thing freeze for two minutes and the only thing on the page is a partially drawn banner. Or worse yet, a partially rendered form that a user may think they can start working with, only to find a minute or so later that it goes nuts.


Which brings me to the next point to ponder: Fail gracefully. As developers, we tend to think that the correct answer is always, “No bugs, we’ll just fix everything.” But the reality is that you’ll never do it. It will never, ever completely go away. So put the cycles into dealing with failure well.


The basic definition of a graceful failure is a failure where everybody knows what happened. Or at least, you’ve controlled the message explaining what happened. Take the default IIS “Server Too Busy” message. It’s not a pretty message, but it gets the job done. It conveys the information. It’s not some weird ASP.NET error that just makes your customer angry at your incompetence, and sends you off trying to debug something that can’t be debugged. The customer may still be annoyed that you weren’t adequately prepared for your bandwidth demands, but at least they know what happened. And you can go from there.


Obviously, you want to provide some more content than the default messages generated from IIS or from ASP.NET. At least a better apology. Give the customer a sense that this shouldn’t have happened, we’re sorry about it, we have a record of it happening, and we’ll deal with it. Ideally, you don’t want to need any message because everything’s handled. But that’s an ideal, and should be treated as such. You have to have something in between.


Of course, nobody likes thinking about failure, but this isn’t about failure. It’s about success. These failures under load are the kinds of things that happen to successful sites. With the Web in resurgence again, with ASP.NET becoming more and more popular, and with sites getting bigger and bigger, the load is only going up from here. You should be planning on being successful, and that means thinking about these things. If you haven’t thought about them, they’re going to think about you.



Friday, May 11, 2007 9:25:52 AM (Pacific Standard Time, UTC-08:00) #    Comments [0]  |  Tracked by:
"Cooking Up a No Code ASP.NET Tuning Solution!" (Richard Campbell Blogs Too) [Trackback]
",guid,1ee1c4cd-fa2f-4934-91d8-7e... [Pingback]


Comments are closed.
All content © 2023, Richard Campbell
On this page
This site
<February 2023>
Blogroll OPML

Powered by: newtelligence dasBlog 1.9.7067.0

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Send mail to the author(s) E-mail

Theme design by Jelle Druyts

Pick a theme: