Performance¶
What is performance?
- complete tasks faster
- use less power/energy
- do more tasks
How is performance measured?
How can we design systems to improve performance?
Measuring Performance¶
- capacity: amount of a resource that’s available
- utilization: how much capacity is being used?
- overhead: resource “wasted”
- useful work: amount of resource spent on actual work
- time-based
- latency: time between input and result
- throughput: rate of results per unit time
Latency v Throughput¶
- latency: elapsed time for a single action
- throughput: rate of actions per time
- throughput = 1/latency
- multistage requests: handle requests in stages
- may increase latency (bad)
- pipelining: stages work concurrently on different requests
- may increase throughput (good)
- latency = time it takes a request to go thru all stages
- throughput = limited by throughput of slowest stage
Designing¶
- first, actually make the system work
- then, iterate:
- measure the system to see if it needs to be better
- measure again to find the bottleneck
- predict the impact of the proposed improvement by assuming the improvement will remove the bottleneck
- implement and measure
Fast Path¶
When some resources are requested way more than others, you can create a special fast path for those resources
- reduces latency for some requests
- common/easy requests
Concurrency¶
- concurrency can reduce latency
- run multiple stages in parallel
- reduces overall latency
- potential issues: amdahl’s (diminishing returns) law, sync
- can increase throughput
- each stage works on a different request
- pipelining
Pipelining¶
- pipelines often don’t actually decrease latency
- pipelines increase throughput by hiding the higher latency
- keep each stage busy
- only requires that each stage be handled by different resources
- each stage has its own interpeter and mem
- use a bounded buffer between stages to keep stages busy
- if different stages take different amounts of time to execute
- absorbs burst
- if incoming request rate > average thruput for one stage, you have overload
Bottlenecks¶
dealing with:
- batching
- handle requests as a group to amortize fixed overhead
- dallying
- delay a request - maybe it won’t be needed or will be batched
- speculation
- guess what the request is
- increases work, decreases latency
- might be easier to do request at earlier time