Performance¶

What is performance?

How is performance measured?

How can we design systems to improve performance?

Measuring Performance¶

capacity: amount of a resource that’s available
- utilization: how much capacity is being used?
- overhead: resource “wasted”
- useful work: amount of resource spent on actual work
time-based
- latency: time between input and result
- throughput: rate of results per unit time

latency: elapsed time for a single action
throughput: rate of actions per time
throughput = 1/latency
multistage requests: handle requests in stages
- may increase latency (bad)
pipelining: stages work concurrently on different requests
- may increase throughput (good)
- latency = time it takes a request to go thru all stages
- throughput = limited by throughput of slowest stage

first, actually make the system work
then, iterate:
1. measure the system to see if it needs to be better
2. measure again to find the bottleneck
3. predict the impact of the proposed improvement by assuming the improvement will remove the bottleneck
4. implement and measure

When some resources are requested way more than others, you can create a special fast path for those resources

concurrency can reduce latency
- run multiple stages in parallel
- reduces overall latency
- potential issues: amdahl’s (diminishing returns) law, sync
can increase throughput
- each stage works on a different request
- pipelining

pipelines often don’t actually decrease latency
pipelines increase throughput by hiding the higher latency
- keep each stage busy
- only requires that each stage be handled by different resources
- each stage has its own interpeter and mem
use a bounded buffer between stages to keep stages busy
- if different stages take different amounts of time to execute
- absorbs burst
- if incoming request rate > average thruput for one stage, you have overload

dealing with:

batching
- handle requests as a group to amortize fixed overhead
dallying
- delay a request - maybe it won’t be needed or will be batched
speculation
- guess what the request is
- increases work, decreases latency
- might be easier to do request at earlier time