Analyzing the Impact of CPU Pinning and Partial CPU Loads on Performance and Energy Efficiency

This site lists links to supplementary plots for our CCGrid 2015 submission:

Supplementary plots

supplementary.pdf

We provide a file of supplementary figures that extend and complement the figures and tables available in our paper.

Besides variants of the plots taken with a different Showstopper configuration and with TurboBoost and frequency scaling switched off, the supplementary figures also include tables directly comparing application throughput under KVM and LXC, which could not be included in the paper due to space constraints.

The supplementary plots and tables are organized as follows:

  • CPU pinning overview (page 1)
  • example experiment time line for KVM and LXC (page 2)
  • 5×5 foreground and background throughput plots for the 107ms dithering quantum (page 3)
  • 10×10 foreground and background throughput plots for the 107ms dithering quantum (page 4–9)
  • 5×5 foreground and background throughput plots for the 53ms dithering quantum (page 10)
  • 5×5 foreground and background throuhgput plots for the 107ms dithering quantum, disabled TurboBoost and disbled frequency scaling (page 11)
  • performance interference metric tables for the 5×5 experiments with the 107ms dithering quantum (page 12)
  • performance interference metric tables for the 10×10 experiments with the 107ms dithering quantum (page 13–14)
  • performance interference metric tables for the 5×5 experiments with the 53ms dithering quantum (page 15)
  • performance interference metric tables for the 5×5 experiments with the 107ms dithering quantum, disabled TurboBoost and disbled frequency scaling (page 15)
  • absolute system throughput time lines (page 16)
    • 107ms dithering quantum
    • 53ms dithering quantum
    • 107ms dithering quantum, disabled TurboBoost, disabled frequency scaling
  • relative system throughput time lines (relative towards the "per-chip" pining) (page 17)
    • 107ms dithering quantum
    • 53ms dithering quantum
    • 107ms dithering quantum, disabled TurboBoost, disabled frequency scaling
  • system power consumption time lines for symmetric colocations of 5 chosen benchmarks (page 18)
    • 107ms dithering quantum
    • 53ms dithering quantum
  • absolute system power efficiency time lines (page 19)
    • 107ms dithering quantum
    • 53ms dithering quantum
  • relative system power efficiency time lines (relative towards the "per-chip" pining) (page 19)
    • 107ms dithering quantum
    • 53ms dithering quantum
  • 5×5 KVM and LXC throughput comparison tables for the 107ms dithering quantum (page 20)
  • 10×10 KVM and LXC throughput comparison tables for the 107ms dithering quantum (page 21–22)
  • 5×5 KVM and LXC throughput comparison tables for the 53ms dithering quantum (page 23)
  • 5×5 KVM and LXC throughput comparison tables for the 107ms dithering quantum, disabled TurboBoost and disbled frequency scaling (page 24)
supplementary.pdf

Time line plots — performance counters, throughput, power consumption

Cycles per instruction and power consumption

107ms quantum53ms quantum
KVMcpi-power.kvmKVMcpi-power.kvm
LXCcpi-power.lxcLXCcpi-power.lxc
bothcpi-power.allbothcpi-power.all
Pages
Pages correspond to workload colocations of avrora, h2, luindex, scalac and specs. There are 25 pages corresponding to the 5×5 colocations.
Rows
There are 2 rows of plots per page. Rows represent observed quantities:
  1. system-wide average number of cycles per instruction
  2. system power consumption
Columns
There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

Cycles per instruction and throughput

107ms quantum53ms quantum
blue and green
FG and BG VEs
blue and green
NUMA nodes
blue and green
FG and BG VEs
blue and green
NUMA nodes
KVMcpi-throughput.kvm.wlcpi-throughput.kvm.numa KVMcpi-throughput.kvm.wlcpi-throughput.kvm.numa
LXCcpi-throughput.lxc.wlcpi-throughput.lxc.numa LXCcpi-throughput.lxc.wlcpi-throughput.lxc.numa
bothcpi-throughput.all.wlcpi-throughput.all.numa bothcpi-throughput.all.wlcpi-throughput.all.numa
Pages
Pages correspond to workload colocations of avrora, h2, luindex, scalac and specs. There are 25 pages corresponding to the 5×5 colocations.
Rows
There are 3 rows of plots per page. Rows represent observed quantities:
  1. number of cycles per instruction, two averages per NUMA node or per virtualized environment (VE)
  2. throughput in the foreground VE with an uncontrolled workload
  3. throughput in the background VE with a Showstopper-controlled workload
Columns
There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

Cycles per instruction, power consumption and throughput (two sets of plots above combined)

107ms quantum53ms quantum
blue and green
FG and BG VEs
blue and green
NUMA nodes
blue and green
FG and BG VEs
blue and green
NUMA nodes
KVMcpi-combined.kvm.wlcpi-combined.kvm.numa KVMcpi-combined.kvm.wlcpi-combined.kvm.numa
LXCcpi-combined.lxc.wlcpi-combined.lxc.numa LXCcpi-combined.lxc.wlcpi-combined.lxc.numa
bothcpi-combined.all.wlcpi-combined.all.numa bothcpi-combined.all.wlcpi-combined.all.numa
Pages
Pages correspond to workload colocations of avrora, h2, luindex, scalac and specs. There are 25 pages corresponding to the 5×5 colocations.
Rows
There are 5 rows of plots per page. Rows represent observed quantities:
  1. number of cycles per instruction, two averages per NUMA node or per virtualized environment (VE)
  2. throughput in the foreground VE with an uncontrolled workload
  3. throughput in the background VE with a Showstopper-controlled workload
  4. system-wide average number of cycles per instruction
  5. system power consumption
Columns
There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

All low-level performance counters (cycles per instruction, cache misses, pipeline stalls)

107ms quantum53ms quantum
blue and green
FG and BG VEs
blue and green
NUMA nodes
blue and green
FG and BG VEs
blue and green
NUMA nodes
KVMverticalvmetrics.kvm.wlvmetrics.kvm.numa KVMverticalvmetrics.kvm.wlvmetrics.kvm.numa
horizontalhmetrics.kvm.wlhmetrics.kvm.numa horizontalhmetrics.kvm.wlhmetrics.kvm.numa
LXCverticalvmetrics.lxc.wlvmetrics.lxc.numa LXCverticalvmetrics.lxc.wlvmetrics.lxc.numa
horizontalhmetrics.lxc.wlhmetrics.lxc.numa horizontalhmetrics.lxc.wlhmetrics.lxc.numa
bothverticalvmetrics.all.wlvmetrics.all.numa bothverticalvmetrics.all.wlvmetrics.all.numa
horizontalhmetrics.all.wlhmetrics.all.numa horizontalhmetrics.all.wlhmetrics.all.numa
Pages
Pages correspond to workload colocations of avrora, h2, luindex, scalac and specs. There are 25 pages corresponding to the 5×5 colocations. Horizontal and vertical layouts are available. The list of rows and columns below corresponds to the vertical layout.
Rows
There are 13 rows of plots per page. Rows represent observed quantities:
  1. number of cycles per instruction, two averages per NUMA node or per VE
  2. L1 data cache miss rate, two averages per NUMA node or per VE
  3. L2 cache miss rate, two averages per NUMA node or per VE
  4. L3 cache miss rate, two averages per NUMA node or per VE
  5. NUMA load/store/prefetch miss rate, two averages per NUMA node or per VE
  6. number of L1 data cache misses, two averages per NUMA node or per VE
  7. number of L2 cache misses, two averages per NUMA node or per VE
  8. number of L3 cache misses, two averages per NUMA node or per VE
  9. number of NUMA load/store/prefetch operations, two averages per NUMA node or per VE
  10. number of NUMA load/store/referch misses, two averages per NUMA node or per VE
  11. pipeline frontend stalls, two averages per NUMA node or per VE
  12. pipeline backend stalls, two averages per NUMA node or per VE
  13. system power consumption
Columns
There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

Comparing KVM and LXC in terms of low-level performance counters and confidence intervals

blue and green denote KVM and LXC
107ms quantum53ms quantum
t-test
(assumes normality)
order statistics
(assumes uniformity)
t-test
(assumes normality)
order statistics
(assumes uniformity)
verticalvstatmetrics.ttestvstatmetrics.orderverticalvstatmetrics.ttestvstatmetrics.order
horizontalhstatmetrics.ttesthstatmetrics.orderhorizontalhstatmetrics.ttesthstatmetrics.order
Pages
Pages correspond to workload colocations of avrora, h2, luindex, scalac and specs. There are 25 pages corresponding to the 5×5 colocations. Horizontal and vertical layouts are available. The list of rows and columns below corresponds to the vertical layout.
Rows
There are 15 rows of plots per page. Rows represent observed quantities:
  1. number of cycles per instruction, means (t-test) or medians (order) and CIs for KVM and LXC
  2. L1 data cache miss rate, means (t-test) or medians (order) and CIs for KVM and LXC
  3. L2 cache miss rate, means (t-test) or medians (order) and CIs for KVM and LXC
  4. L3 cache miss rate, means (t-test) or medians (order) and CIs for KVM and LXC
  5. NUMA load/store/prefetch miss rate, means (t-test) or medians (order) and CIs for KVM and LXC
  6. number of L1 data cache misses, means (t-test) or medians (order) and CIs for KVM and LXC
  7. number of L2 cache misses, means (t-test) or medians (order) and CIs for KVM and LXC
  8. number of L3 cache misses, means (t-test) or medians (order) and CIs for KVM and LXC
  9. number of NUMA load/store/prefetch operations, means (t-test) or medians (order) and CIs for KVM and LXC
  10. number of NUMA load/store/referch misses, means (t-test) or medians (order) and CIs for KVM and LXC
  11. pipeline frontend stalls, means (t-test) or medians (order) and CIs for KVM and LXC
  12. pipeline backend stalls, means (t-test) or medians (order) and CIs for KVM and LXC
  13. system power consumption, means (t-test) or medians (order) and CIs for KVM and LXC
  14. foreground throughput (uncontrolled workload), means (t-test) or medians (order) and CIs for KVM and LXC
  15. background throughput (controlled workload), means (t-test) or medians (order) and CIs for KVM and LXC
Columns
There are 3 columns of plots corresponding to the 3 CPU pinning configurations common to KVM and LXC ("per-chip", "per-core" and "per-thread").

Compressed archives with all time line plots

107ms quantum53ms quantum
plots.tar.xzplots.tar.xz
plots.tar.bz2plots.tar.bz2

All plots of low-level metrics listed in the previous section can be downloaded in one big compressed archive.

Raw listing of all available files

107ms quantum53ms quantum
file listfile list

For those already familiar with the file naming convention, we provide a quick and easy access to a raw listing of all files of plots linked above. Additionally, two simple text files with a summary of average numeric values, easy to search and sort, are provided for convenience, for the 107ms dithering quantum as well as for the 53ms dithering quantum. They compare the cycles per instruction statistics for NUMA nodes and worklaods, the total amount of work (number of iterations) accomplished in the foreground and background virtualized environments and the amount of energy consumed in an experiment.

Logo of Faculty of Mathematics and Physics
  • Phone: +420 951 554 267, +420 951 554 236
  • Email: info<at-sign>d3s.mff.cuni.cz
  •  
  • How to find us?
Modified on 2016-03-02