Home » performance

Calculating Throughput and Response Time

20 June 2005 156,104 Views 5 Comments

In software, response time measures a client’s perspective of the total time that a system takes to process a request (including latency). The response time of a single request is not always representative of a system’s typical response time. In order to get a good measure of response time, one will usually calculate the average response time of many requests. Response time is usually measured in units of “seconds / request” or “seconds / transaction”. (Note: Don’t confuse response time and latency.)

Throughput is the measure of the number of messages that a system can process in a given amount of time. In software, throughput is usually measured in “requests / second” or “transactions / second”.

When I first started doing performance analysis, I naively assumed that throughput and response time were linearly related and were thus reciprocols of one another. Though there are conditions that might allow these two system measurements to be inversely proportional, it is definitely not a given.

Let’s look at a real-life example; consider a checkout lane in a grocery store. Let’s assume that the cashier always takes 2 minute to check out a customer. Let’s also assume that there is no line and that a new customer walks up to the cashier at the exact moment that another customer was done checking out, with absolutely no delay between customer checkouts. If we have 10 such customers, we would calulate response time and throughput as follows.

To calculate response time, we sum up the total checkout time for all customers and divide by the number of customers:

Response time = 20 minutes / 10 checkouts = 2 minutes / checkout

To calculate latency, we calculate the average wait time in line:

Latency = 0 minutes / 10 checkouts = 0 minutes / checkout

We can also measure the rate at which things occur:

People that got in line / minute

  • This is the queue input rate
  • The first person got in line at time 0, the last person got in line at time 18 min
  • 10 people / 18 minutes = .56 people got in line / minute

People that got to the register / minute

  • This is the queue output rate and it is also the system input rate
  • The first person got to the register at time 0, the last person got to the register at time 18 min
  • 10 people / 18 minutes = .56 people started checking out / minute

People that finished checking out / minute

  • This is the system output rate
  • The first person finished checking out at time 2 min, the last person finished at time 20 min
  • 10 people / 18 minutes = .56 completed checkouts / minute

People that the cashier checked out / minute

  • This is the processing rate
  • The first person started checking out at time 0 min, the last person finished checking out at 20 min
  • 10 people / 20 minutes = .5 checkouts / minute

As you can see there are many different rates that we can measure. People use the word throughput to refer to all of these different rates, but generally when we talk about throughput in software we are referring to the processing rate (people that the cashier checked out / minute). Depending on how we are measuring, either the system input rate or the queue input rate is also known as the system load. Accordingly, the term load testing is used to describe a test where we send many requests into a system and observe the its non-functional behavior.

Based on this input, it looks like throughput and response time are inversely proportional:

Throughput = .5 checkouts / minute
Response Time = 2 minutes / checkout

or…

Throughput = 1 / Response Time [NOT ALWAYS TRUE]

This is because we have no latency and because our system was provided with exact conditions that allow it to have a load without customer wait time or cashier idle time.

Let’s vary our example a little. What if 10 people used this same checkout lane, but each person arrived in line 1 minute after the last person was done checking out? The cashier is just twiddling his thumbs, waiting for a customer for 1 minute. The cashier is still capable of checking out a customer in 2 minutes, so the average response time is still 2 minutes / customer, but the throughput of people coming out of the checkout lane is not the same.

Response time = 20 minutes / 10 checkout= 2 minutes / checkout
Latency = 0 minutes / 10 checkouts = 0 minutes / checkout
Throughput = 10 checkouts / 29 minutes = .34 checkouts / minute

Let’s also consider the opposite. What if 10 people used the checkout line at nearly the same time. In other words, what if there was a line of 9 people behind a customer who is being checked out? From a customer perspective, the checkout time (or response time) is the amount of time from when they get in line until they are done checking out, and the latency is how long it takes them to get to the cashier from the time they get in line. The first person to get to the checkout lane wouldn’t wait at all. The first person in line (not the one being checked out currently) would wait 2 minutes to start checking out, the second person would wait 4 minutes, and so on until the last person who would wait 18 minutes to start being checked out.

Response time = 110 minutes / 10 checkout= 11 minutes / checkout
Latency = 90 minutes / 10 checkouts = 9 minutes / checkout
Throughput = 10 checkouts / 20 minutes = .5 checkouts / minutes

From a customer’s perspective, the average customer checkout time is greater, even though the clerk is still working at the same speed and is able to push 10 people through line in 20 minutes. The checkout lane is saturated at the point when the queue input rate exceeds the queue output rate. As you can see, the rate at which customers are getting in line makes all the difference. The term degradation is often used to describe a system whose response time increases when the load is increased. In our grocery example, our system starts degrading when we have more than one customer get in line every two minutes.

In this grocery example, people can form a line. In a software system, the line (or queue) is either going to be on the sender side or the receiver side, depending on whether the system is synchronous or asynchronous. If a receiver blocks all messages until it is done executing its current request, then the system is synchronous and the queue is on the sender’s side. If the receiver accepts messages as fast as possible, and uses a seperate execution thread to execute request, then the receiver must have a queue and the system is said to be asynchronous. You could have a queue on both the sender and receiver, but this is usually superfluous. See: Synchronous vs. Asynchronous Systems.

Software load is usually measured in requests per second. For example, you may describe the load on a system as “10 request per second”. In a real-world scenario, the load will change as a function of time. In a grocery store, more customers will try to check out at peak shopping hours. In the stock market, the most volume is traded in the first and last 15 minutes the market is open. A Web page will have different load depending on the day of the week and the time. Thus, if you are designing a test of your system, you want to determine the behavior under different types of load.

In order to improve the performance of our grocery store, we can make it multi-threaded by adding more lanes. This concurrency helps in two ways:

  • It can minimize the response time that each customer experiences by reducing wait times in line
  • It can increase throughput

Let’s say we have 10 lanes, 10 customers that get to the checkout area at the same exact time, each customer goes to a different lane, each checkout takes 2 minutes.

Response time = 20 minutes / 10 checkout= 2 minutes / checkout
Latency = 0 minutes / 10 checkouts = 0 minutes / checkout
Throughput = 10 checkouts / 2 minutes = 5 checkouts / minute

With a single lane, our response time for this same load was 11 minutes / checkout, but with multiple lanes, our response time is 2 minutes / checkout, the best that our system can provide. Increasing the number of lanes (or threads) increased our throughput and allowed us to maintain optimum response time.

But, of course, nothing is free. In this case, we’ve increased the number of active employee resources to optimize our performance, but we must pay for those resources. In software, we have to worry about system resources. We can spawn off multiple threads, but we have to be careful how much CPU and memory each execution thread is utilizing.

Technorati Tags: , ,

5 Comments »

  • naasking said:

    Just came across this, and thought I’d add:

    With a single lane, our response time for this same load was 11 minutes / checkout, but with multiple lanes, our response time is 2 minutes / checkout, the best that our system can provide. Increasing the number of lanes (or threads) increased our throughput and allowed us to maintain optimum response time.

    In general, this is definitely NOT true. In this analogy, the number of employees you have is equivalent to the number of CPUs on your machine. Opening more lanes (like 11) can thus only INCREASE response time and decrease throughput since the cashier now has to run between lanes to service them all. Optimal response time is achieved by opening as many lanes as you have employees.

    You essentially say this in the last paragraph, but the way you portray it is backwards: you don’t start with x lanes and try to satisfy with by hiring as many employees; instead, you have y CPUs and your architecture should only run y concurrent threads for optimal ultilization.

    This is predicated on the assumption that the cashier is always actually doing real work while servicing a customer. If she’s not, then you require a fully non-blocking asynchronous architecture to maintain optimal utilization (best), or you increase the number of threads (worse).

  • Javid Jamae (author) said:

    Good point! I need to update my analogy to cover multiple CPUs and non-blocking IO. Thanks for the feedback. I’ll try to get to this soon.

  • Random Thoughts on Software » Blog Archive » Graphing Throughput said:

    [...] my post on calculating throughput and response time I discussed how to measure the average throughput and response time for a given amount of time. I [...]

  • paul said:

    Im trying to figure out something similiar to this, basically im looking to start counting in the urlrewriting and end at the end of the page rendering. I have a component that sends a UDP packet out in the render method at the end of the body loading,but i need to link in the load time for that request to the start of the url rewriting request. The main problem i can see is any redirects that happen, and being able to break down any blocking operations such as DB access within it.

  • Lynn said:

    I’ve been seeking for this kind of measurements for a while. Thanks for your sharing. But I have a question: In your example, the process is finite, 10 people to deal with, what if the process is infinite? If the infinite process is periodic, it is OK that we can just calculate the throughput, response time for one period. But how to calculate the throughput, response time and etc for a non periodic infinite execution of a system? Thank you very much!

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.