Dynimizer + MySQL Cross-Microarchitecture Analysis

Dynimizer + MySQL Cross-Microarchitecture Analysis

In this blog post I will discuss how the Dynimizer cross microarchitecture performance results were obtained, followed by an analysis of these results.

You can download the scripts to generate similar graphs for your own system from here. Note this repository also contains the results generated for all the tests performed. A trace of the script executing each command was generated by using #!/bin/bash -x, and saved in the output.log files. A full run across all 5 benchmarks takes approximately 2 hrs on the systems I used.

Prerequisites  

In order to use these scripts, you must first install Dynimizer:

wget https://dynimize.com/install -O install
wget https://dynimize.com/install.sha256 -O install.sha256
if sha256sum -c install.sha256 | grep OK; then sudo bash ./install -default; fi

 

You'll also need to install gnuplot for the graphs to be generated:

Ubuntu/Debian:

$ apt-get install gnuplot

 

RHEL/CentOS/Fedora:

$ yum install gnuplot

 

The CPU hardware events are collected using the Linux perf tool, which should also be installed if you want to generate the CPU events graphs:

Debian/Ubuntu:

$ sudo apt-get install linux-tools-common linux-tools-generic

 

RHEL/CentOS/Fedora:

$ sudo yum install perf

 

After this you should try calling the perf command to make sure it was actually installed.

CPU frequency scaling can produce sporatic results when running so many tests back to back as is done by this script, so disabling it is highly advisable. On my systems, I did that by editing /etc/default/grub (first make a backup copy of that file) and modifying the line with GRUB_CMDLINE_LINUX_DEFAULT= to include intel_pstate=disable. For example:

GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable noquiet nosplash net.ifnames=0 biosdevname=0"

 

Now update grub and reboot:

$sudo update-grub
$sudo reboot

 

After each reboot, issue the following command to fully disable frequency scaling by setting the scaling governor to "performance":

$ for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || continue; echo -n performance > $CPUFREQ; done

 

Now check each core's frequency and verify that it is set to the maximum:

$ cat /proc/cpuinfo | grep MHz

 

cpu MHz : 3701.000
cpu MHz : 3701.000
cpu MHz : 3701.000
cpu MHz : 3701.000
cpu MHz : 3701.000
cpu MHz : 3701.000
cpu MHz : 3701.000
cpu MHz : 3701.000

 

You'll also need to install Sysbench 1.0:

Debian/Ubuntu

$ curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.deb.sh | sudo bash
$ sudo apt -y install sysbench

 

RHEL/CentOS:

$ curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh | sudo bash
$ sudo yum -y install sysbench

 

Fedora:

$ curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh | sudo bash
sudo dnf -y install sysbench

 

Make sure that you have at least 4 GB total of free Swap + Mem for Dynimizer:

$ free

 

        total    used    free    shared    buff/cache    available
Mem: 16397624 1360380 11741576    13868       3295668     14674468
Swap: 1046520       0  1046520

 

If not, increase your available swap space .

Obviously make sure that MySQL is installed. Make sure innodb_buffer_pool is large enough to store all the tables in memory. For our runs of 1M rows x 10 tables, innodb_buffer_pool=4GB was more than sufficient.

 

Running the tests

The tests are run by executing plotSysbench.sh. Before running the script, you may need to edit some or all of these variables at the start of the file. Below were the settings used for the runs:

tableSize=1000000
numTables=10
pswd='password'
dbName='test'
user="mysqlUser"
readOnly="on"
measureTime=60
warmupIncrementTime=30
events="false"
cpuUsage="false"
useTaskSet="true"

 

Setting events="true" will result in the perf command running during each Sysbench run. It will interfere with the throughput results slightly, so for these perf results we decided to perform two runs of plotSysbench.sh for each MySQL/Server combo; one run with events="false" to generate the throughput graphs, another run with events="true" to collect perf events separately. Note that the transactions/cycle graphs do use the throughput numbers generated with events="true".

Setting cpuUsage="true" generates MySQL CPU usage graphs through the top command, and was set to "false" for all of these runs. Setting power="true" measures power consumption on a laptop with a discharging battery. We'll save those experiments for another post.

useTaskSet was set to "false". Setting it to true will allow mysqld to run with less interferance from Sysbench and may provide you with slightly greater speedups when using Dynimizer vs not using Dynimizer, at the expense of making these runs less realistic. Ideally we would have run these tests with Sysbench and mysqld isolated on separate servers with a direct connection such that Sysbench can still saturate the CPUs on the MySQL server. Running mysqld this way in isolation will provide the greatest speedup when using Dynimizer vs without Dynimizer, since speeding up mysqld makes Sysbench work harder to keep up, which makes Sysbench consume more CPU cycles, stealing them away from mysqld and dampeding the speedup. This was the case with these runs, however we decided to do it this way and reduce the total speedup because it's much easier for the average user to validate these tests by running them on their own systems without requiring a special hardware setup.

For each benchmark, plotSysbench.sh will first perform a warmup run at 32 client connection threads until the mysqld process is dynimized, then collect results for 1 to 128 threads without restarting mysqld. For each benchmark, it will record the number of warmup runs required to dynimize mysqld, each run executing for a duration set by the warmupIncrementTime variable. It will then repeat the same number of warmups before taking measurements for the runs without Dynimizer.

 

CPU Hardware Events

The linux perf tool was used here to count CPU events of the mysqld process during the Sysbench runs, which help us better understand how Dynimizer is speeding up the mysqld process. All events are totals across all CPU cores on a single system. See Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3 for more details.

The following seven events were measured: cycles, instructions retired, instruction cache misses, instruction fetches, instruction TLB misses, conditional branch misses retired, conditional branches retired.

 

Analysis

From these throughput and CPU events results, we can see that the overall trend across each benchmark is that Dynimizer significantly reduces instruction cache misses, ITLB misses, and branch mispredictions, thereby increasing IPC (instructions per cycle) and improving performance. Generally speaking, the greater the number of instruction cache misses, instruction TLB misses and branch mispredictions per transaction, the greater the speedup Dynimizer provides these benchmarks. I believe the improvements in transactions per cycle are greater than the total speedup measured here primarily because the Sysbench process in this setup is run on the same server as the mysqld process, stealing cycles from mysqld as mysqld speeds up. It is also because Dynimizer slightly reduces instructions per transactions as well, thereby increasing efficiency beyond IPC alone.

Note that we chose to only run read-only Sysbench MySQL benchmarks, since these would be easy for anyone to make CPU bound and illustrate actual increases in performance. When not CPU bound, Dynimizer will still increase IPC but may result in the same throughput numbers with reduced CPU utilisation and power consumption. We'll explore Dynimizer with the benchmarks that include writes in a future post.

The systems tested here represent the past 3 major microachitecture redesigns of Intel CPUs, all of which show benefits from Dynimizer. I wasn't able to rent a dedicated Xeon server with older processors from my service provider for these tests, however I've tested Dynimizer with MySQL + Sysbench on my own older Nehalem and Core microarchtecture laptops, the latter microarchitecture being introduced all the way back in 2006, and both show similar speedups to what was observed with these newer machines. The point here is to appreciate that Dynimizer speedups are applicable across a broad range of processor generations and MySQL benchmarks.

Share

David Yeager is the founder of Dynimize Inc., and built the initial release of Dynimizer from the ground up. His area of expertise is in just-in-time compilation and computer architectures. Previously, David was a member of IBM's JIT compiler development team. David has a Master of Engineering degree in Electrical and Computer Engineering from the University of British Columbia.

Product

Dynimizer

Dynimizer Beta License
Performance

Speedup

CPU Stalls
Docs

User Reference

Tutorials
Communicate

Support

Contact


COPYRIGHT © DYNIMIZE INC.