Dynimizer
Overview

Dynimizer, the industry's first CPU performance virtualization software, allows users to improve the CPU-side performance and power consumption of select workloads, allowing them to extract greater value out of their current compute infrastructure. Dynimizer can be deployed with almost no installation effort by the end user and without having to restart target applications, thereby resulting in instant ROI.









Key Benefits

• Improved CPU performance
IPC (instructions per cycle) improvements of up to 50%, resulting in greater performance for some CPU intensive workloads. Instruction cache misses, instruction TLB misses, and branch prediction misses may be reduced to provide the effect of having greater CPU resources than actually exist.


• Reduced CPU energy consumption
By increasing IPC, processors require fewer cycles to perform the same amount of work, thereby reducing energy consumption and improving performance/watt. Applications can now spend more time idling and less time burning CPU cycles, saving energy and freeing up CPU resources for other work.

• Fully autonomous
Dynimizer runs as a stand alone background process and automatically detects and optimizes CPU intensive applications.

• Highly configurable
Dynimizer can be run autonomously or invoked directly on specific program instances or processes. Controls can be put in place to limit the applications it targets.

• Zero downtime, instant ROI
Dynimizer can be applied to off-the-shelf applications without having to make any changes to them or their host systems. Running applications do not need to be restarted, and their source code is not required. Applications can be optimized in 60 seconds.





What is CPU Performance Virtualization?

CPU performance virtualization (CPV) is the use of software to emulate a higher performing CPU microarchitecture than currently exists on a system. This is accomplished by transparently optimizing the in-memory machine code of programs as they run, without the user having to modify the program in any way. The end goal is to mimic the user experience of having better performing CPUs when running the applications being targeted for optimization.

As an optimization agent that is potentially invisible to the end user, CPV acts as an extension of the hardware platform in software. Modern CPUs perform optimizations to machine code instruction sequences on the fly in hardware, while CPU performance virtualization acts as an extension to this, performing more advanced optimizations to the instruction sequences using more complex software algorithms. The end result is the effect of having increased CPU performance and/or reduced power consumption for select software workloads.

Acting as a CPV agent, Dynimizer currently improves the performance of workloads that spend a lot of time in CPU instruction cache misses, instruction TLB misses, and to a lesser extent branch mispredictions, where all gains are found in user mode execution. High performance Online Transaction Processing (OLTP) workloads typically exhibit these characteristics, where Dynimizer has been shown to improve effective CPU efficiency by up to 50% for MySQL. This software CPU performance virtualization produces the effect of having more efficient, resource rich CPUs than are actually present.



How does Dynimizer work?

• A JIT Compiler
Dynimizer is a Just-In-Time (JIT) compiler that profiles programs, and then uses that profiling information to better rewrite those program's in-memory machine code for improved performance.

• Exploiting Run-Time Information
Because it optimizes an application's machine code at run-time, Dynimizer has far more information about how the program is being used and its run-time environment than the original compiler that produced its executable code and shared libraries. This allows it to generate higher quality machine code for each specific run.

• Machine Code Specialization
This machine code specialization is done each time a program is run, and can optionally be done repeatedly throughout the lifetime of a program if its workload drastically changes from the last time it was optimized by Dynimizer.

• Lightweight Daemon Process
Dynimizer runs as a single lightweight background process in user mode, and optimizes the in-memory instructions of other processes running on the same host OS, using the standard ptrace Linux system call to make changes to the processes being optimized.

• Optimizes In-Memory Instructions
Dynimizer does not modify in any way an optimized program's on-disk files, such as data, configuration, executable or shared library files.

• A New Frontier For JIT Compilers
Just-In-Time (JIT) compilers have been used for over 15 years in production environments to enhance the performance of managed run- times such as the Java Virtual Machine and .NET. Those JIT compilers use as input a virtual machine code format (such as Java bytecode), and perform profile directed optimizations when transforming that into real machine code. Dynimizer does this while using real machine code as input instead.



Dynimizer lifecycle

The following steps outline the various stages Dynimizer goes through when optimizing programs.

  1. Dynimizer is started as a deamon user mode process.
       $ sudo dyni -start
    Dynimizer started

  2. Dynimizer monitors system processes and identifies any CPU intensive program that is on its list of allowed optimization targets. This phase consumes hardly any system resources.
       $ sudo dyni -status
    Dynimizer is running

  3. If Dynimizer identifies such a program, it then begins profiling it in detail, collecting statistics about how much time is being spent in each part of the program, along with other execution characteristics.
       $ sudo dyni -status
    Dynimizer is running
    mysqld, pid: 20722, profiling

  4. Dynimizer optimizes the running program, and consumes additonal system resources while doing so. This typically takes anywhere from 30 to 300 seconds.
       $ sudo dyni -status
    Dynimizer is running
    mysqld, pid: 20722, dynimizing

  5. Dynimizer has finished optimizing the program and has released any resources consumed by Dynimizer during stage 4.
       $ sudo dyni -status
    Dynimizer is running
    mysqld, pid: 20722, dynimized

  6. If Dynimizer identifies a previously optimized program who's workload has drastically changed and is still CPU intensive, it returns back to stage 4, otherwise it returns to stage 2.


  7. Dynimizer can do this to several processes in parallel, where each process being optimized incurs additional overhead by Dynimizer during stage 4.
       $ sudo dyni -status
    Dynimizer is running
    mysqld, pid: 20722, dynimizing
    sysbench, pid: 20770, dynimizing





Product Scope

Note that the following scope applies to the current release of Dynimizer 1.0 Beta and will expand in future product releases.

To obtain benefit from the current version of Dynimizer 1.0 Beta, all of the following workload conditions must be met:

• Linux
The workload runs on Linux, with a kernel version 2.6.32 or later.

• A small number of CPU intensive processes
On a given OS host where the workload is running, the workload must be comprised of one or a few CPU intensive processes. Optimizing a large number of processes at once is not recommended.

• Long running programs
The processes being optimized have long lifetimes, and their workloads are long running in order to amortize the warmup time associated with optimization.

• x86-64
Optimized processes must be 64-bit, derived from x86-64 executables and shared libraries, which must comply with the x86-64 ABI and ELF-64 formats. Most statically compiled applications on Linux meet this requirement.

• Dynamically Linked
Target processes must be dynamically linked to its shared libraries. Statically linked processes are not yet supported. Most Linux programs are dynamically linked.

• No self-modifying code
The target application must not be running its own Just-In-Time compiler such as those found in Java virtual machines. This therefore excludes Java applications.

• Front-end CPU stalls
The workload wastes a lot of time in CPU instruction cache misses, instruction TLB misses, and to a lesser extent branch mispredictions.

• User mode execution
Much of that wasted time is spent in user mode execution (as opposed to kernel mode), as Dynimizer only optimizes user mode machine code.

Because of the above requirements, Dynimizer takes a whitelist approach when determining if programs are allowed to be optimized, with MySQL and its variants being the currently supported optimization targets on that list for this early beta release. Other programs are not currently supported, and while they can be used with Dynimzer, they should be very thoroughly tested by the user or system administrator before being deployed in a production environment.

Have an application that meets these requirements and is not officially supported by Dynimizer? Let us know at at apps@dynimize.com.


Dynimizer System Overhead

Dynimizer optimizes running software, and so it consumes CPU and memory resources while performing these optimizations. For it to be effective, this overhead must be more than offset by the performance gained through optimization. Once optimization is complete, Dynimizer consumes hardly any system resources. This overhead therefore occurs early in a program's execution. For this reason, the longer a program runs at steady state, the greater the benefit of using Dynimizer since the overhead can be amortized over that period of time. Future versions of Dynimizer will eventually eliminate most of this overhead.


MySQL Systbench OLTP throughput with and without Dynimizer.





Dynimizer Under The Hood

Dynimizer is a user mode daemon process that is run with superuser permissions. It profiles applications using the Linux perf_events subsystem and interfaces with a target application's machine code through the Linux ptrace system call. When optimizing a program, it loads a code cache into the target program's address space. A code cache is a memory region that will contain optimized machine code generated by Dynimizer. It splices an application's machine instructions into optimization regions, converts the machine code of each region into an intermediate representation (IR), and annotates that IR with the collected profiling information. Guided by this profiling information, it performs various optimizations on the IR, such as high-level architecture independent dataflow based compiler optmizations like constant propagation and dead code elimination, as well as microarchitecture specific optimizations. The IR is then converted back to machine code and committed to the code cache. The target program is modified using ptrace, such that every invocation of each optimization region in the original machine code becomes an invocation of its optimized code in the code cache. All updates to the target process are done atomically so that application functionality is always maintained at any stage throughout this process.

High level illustration of how Dynimizer works




Where Dynimizer Helps

Dynimizer accelerates high performance online-transaction-processing (OLTP) workloads for the relational database products MySQL, MariaDB, and Percona Server on Linux.

The initial release of Dynimizer mostly focuses on reducing front-end CPU stalls. These are the delays encountred by a CPU when bringing in instructions from memory to its execution units. In technical jargon, they are comprised of issues such as instruction cache misses, branch prediction misses, and iTLB misses. These front-end stalls are typically a bottleneck when performing OLTP workloads that are CPU intensive. MySQL's process architecture, and the nature of optimizations that reduce front-end CPU stalls make MySQL OLTP workloads an obvious place to start with Dynimizer. Other capabilities and applications will be supported in subsequent releases.







Product

Dynimizer

Dynimizer Beta License
Performance

Speedup

CPU Stalls
Docs

User Reference

Tutorials
Communicate

Support

Contact


COPYRIGHT © DYNIMIZE INC.