Dynimizer User Reference



Version 1.0








dyn·imize

verb
  1. to dynamically optimize the in memory machine code of a running process.
modern English: from a combination of dynamic and optimize


dyn·imizer

noun
  1. a software agent that dynamically optimizes the in-memory machine code of a running process
modern English: from a combination of 'dynamic' and 'optimizer'




1. Overview

Dynimizer is a user mode program that quickly optimizes the in-memory machine code of target processes based on profiling information it gathers at run-time, improving performance for some workloads. Dynimizer typically runs as a background system process and does not require the source code of target programs that it optimizes.



2. Installation & Quick Start


To install Dynimizer with default options, run the following commands in a Linux terminal:

wget https://dynimize.com/install -O install
wget https://dynimize.com/install.sha256 -O install.sha256
if sha256sum -c install.sha256 | grep OK; then sudo bash ./install -default; fi



Dynimizer is now installed and can be further configured by editing /etc/dyni.conf. To optimize any CPU intensive target process who's exe is listed in the [exeList] section of /etc/dyni.conf, run:

$ sudo dyni -start



The command dyni -status will show target processes progressing from the "profiling", to "dynimizing", and then "dynimized" states. A process has been fully optimized once in the "dynimized" state:

$ sudo dyni -status


Dynimizer is running
mysqld, pid: 8375, dynimizing



$ sudo dyni -status


Dynimizer is running
mysqld, pid: 8375, dynimized



To shutdown Dynimizer, run:

$ sudo dyni -stop



To uninstall Dynimizer, run:

$ sudo bash -c 'bash <(wget -O - https://dynimize.com/uninstall)'




3. System Requirements

    x86-64 Linux Kernel version 2.6.32 or later

    4GB virtual memory (swap + RAM) per target process being dynimized in parallel. This is because Dynimizer requires 4 GB per target process in the "dynimizing" phase, releasing the memory after that.

    Above virtual memory requirement increased to 6GB (swap + RAM) per target process being dynimized when the target is the MySQL 8 mysqld process, because the mysqld executable for MySQL 8 is twice the size as that of MySQL 5.7.


3.1 VM specific requirements

There are no specific requirements for running inside virtual machines from the KVM, Xen or ESXi (VMWare) hypervisors. Dynimizer can even use perf_event_open() with the system task clock event when virtual PMUs are not enabled. Therefore Dynimizer can be run in most cloud environments.


3.2 Docker specific requiremets

Dynimizer must be run in the same container as the target process being optimized. When run, the dyni process requires the ability to use the ptrace(), process_vm_readv(), process_vm_writev(), and perf_event_open() system calls to profile and interface with the target process being optimized. This requires the dyni process to be run with the CAP_SYS_ADMIN and CAP_SYS_PTRACE capabilities. Docker as of the time of this writing (version 18) disables these system calls and capabilities by default. The simplest way to enable them is by starting a docker container with the --privileged flag. For example, $run -it --privileged my_image. Alternatively, see Docker's documentation on seccomp profiles for specifically enabling just these system calls.



4. Command Line Options

Synopsys dyni [options]
Options
• -help Print the help message
• -start Start Dynimizer as a background daemon process
• -stop Stop Dynimizer daemon process
• -status Print the state of Dynimizer and its optimized processes. States will be listed in the following format: target process executable name, pid, state
• -version Print the Dynimizer version
• -log Specify a path to the output log
• ‑optimizeOnce:<y/n> Do not re-dynimize a process after it has been dynimized and its workload has dramatically changed. Dynimizer will exit after dynimizing a process when used in conjunction with -pid <number>
• ‑fastCompile:<y/n> Shorten the amount of ramp-up time it takes to dynimize a process, at the expense of final steady state performance after ramp-up
• ‑lowOverhead:<y/n> Reduce profiling overhead for target processes, while incurring a slight profiling overhead across the entire system while dynimizing those processes. May also prolong the dynimizing phase and marginally reduce the final speedup of dynimized processes
• ‑exe <name> Only processes run from this executable can be Dynimized
• ‑secureCodeCache:<y/n> Specify that the JIT code caches generated by Dynimzer not be both readable and writeable simultaneously. May incur a marginal performance penalty. Usually necessary when SELinux is being enforced
• ‑pid <number> Only dynimize the process specified by this pid number. This will cause Dynimizer to exit after dynimizing a process when used in conjunction with -optimizeOnce




5. Dynimizer Usage Examples

The sudo command should be appended to the start of all dyni commands if not executed as superuser. This has been left out for readability.

Start the Dynimizer daemon process with a new log path directory "/tmp/log":

$ dyni -start -log /tmp/log


Stop Dynimizer:

$ dyni -stop


Read Dynimizer's status:

$ dyni -status


Dynimize a process at pid 9618 and exit once dynimized:

$ dyni -start -pid 9618 -optimizeOnce:y


Launch Dynimizer as a foreground process, only dynimizing process 9618 and then exiting. Note that the -start option is excluded here so that Dynimizer is not launched in the background, which can be useful for running Dynimizer from shell scripts:

$ dyni -pid 9618 -optimizeOnce:y


Only dynimize processes run from the "myprog" executable:

$ dyni -start -exe myprog


Dynimize processes and shorten the amount of time spent in the dynimizing phase:

$ dyni -start -fastCompile:y


Dynimize processes, specifying the JIT code caches are not both readable and writeable at the same time, which is a common requirement in SELinux security policies:*

$ dyni -start -secureCodeCache:y


* A JIT code cache is a memory region that is automatically loaded into a target process by Dynimizer. Dynimizer primarily uses it to store the new optimized machine code sequences it generates, amongst other things.



6. Configuration File

On startup, Dynimizer is configured based on the settings in /etc/dyni.conf. The following is a description of these settings. Many of these settings can be overridden by command line options. Note that starting a line with the # character will cause that line to be skipped.

All options follow the line in the .conf file containing the string [options]. If any of those following options are missing, their default settings are used. Note that the < | > characters should not actually be placed in the log file:

Option Description Default
log:<path> Same as the command line option -log No log
maxLogSize:<size> The max total size of both log files at <path> and <path>.old combined. Size is in bytes unless specified by MB, M, GB, G, KB, K 1 MB
optimizeOnce:<y|n> Same as the command line option ‑optimizeOnce n
fastCompile:<y|n> Same as the command line option ‑fastCompile n
lowOverhead:<y|n> Same as the command line option ‑lowOverhead n
initdService:<y|n> Set to y for Dynimizer to be launched on system startup n
secureCodeCache:<y|n> Initially set to y if ‑selinux:y was selected when running ./INSTALL.sh. Same as the command line option ‑secureCodeCache n


The lines following [exeList] denote the whitelist of possible optimization targets. These are the names of the executables that are used to launch the processes that can be targeted by Dynimizer, one name per line. So for example, the MySQL server process would be denoted as mysqld. A process must have been started by an executable on this list to be dynimized. Note that if a symbolic or hard link is used to call an executable, the real executable file name must be specified on the [exeList].

The lines following [users] list the possible user names of the owners of processes that can be targeted by Dynimizer, one name per line. If present, a process must match both the [exeList] and [users] list in order to be dynimized. Default: no user name requirement.

The output log specified by log: records all errors and Dynimizer status updates along with their timestamps. On installation the log path is set to /var/log/dyni.log in /etc/dyni.conf. Once the log file reaches half the maxLogSize, it is moved to the same path name appended with .old, and a new log is started as the original log file name. Logging is disabled if the log option is removed from dyni.conf.



7. Workload Requirements

To obtain benefit from the current version of Dynimizer, all of the following workload conditions must be met:

A small number of CPU intensive processes
On a given OS host where the workload is running, the workload must be comprised of one or a few CPU intensive processes. Optimizing a large number of processes at once is not recommended.

Long running programs
The processes being optimized have long lifetimes, and their workloads are long running in order to amortize the warmup time associated with optimization.

x86-64
Optimized processes must be 64-bit, derived from x86-64 executables and shared libraries, which must comply with the x86-64 ABI and ELF-64 formats. Most statically compiled applications on Linux meet this requirement.

Dynamically Linked
Target processes must be dynamically linked to its shared libraries. Statically linked processes are not yet supported. Most Linux programs are dynamically linked.

No self modifying code
The target application must not be running its own Just-In-Time compiler such as those found in Java virtual machines. This therefore excludes Java Applications.

Front-end CPU stalls
The workload wastes a lot of time in CPU instruction cache misses, instruction TLB misses, and to a lesser extent branch mispredictions.

User mode execution
Much of that wasted time is spent in user mode execution (as opposed to kernel mode), as Dynimizer only optimizes user mode machine code.

Because of these requirements, Dynimizer takes a whitelist approach when determining if programs are allowed to be optimized, with MySQL and its variants being the currently supported optimization targets on that list for this early beta release. Other programs are not currently supported, and while they can be used with Dynimizer, they should be very thoroughly tested by the user or system administrator before being deployed in a production environment.

Future versions of Dynimizer may eliminate many of these workload requirements, broadening the variety of applicable scenarios as well as further increasing the performance delivered in previously beneficial cases.



8. Miscellaneous Notes

Dynimizer 1.0 Beta Not fit for production outside of MySQL, MariaDB, and Percona Server
Dynimizer has not been extensively stress tested with non-MySQL targets. It is only suitable for demonstration purposes there.

Sequential CPU speedup
Dynimizer usage may result in the speedup of the target application and/or a reduction in CPU resources consumed by that application. Speedup typically occurs when sequential CPU performance is a bottleneck. When sequential CPU performance is not a bottleneck, a reduction in CPU resources used by the optimized process is experienced which can be observed by an increase in CPU idling. All improvements are generally limited to time spent in user mode execution.

Dynimizer performs work in response to CPU usage
When a process whose executable is listed in the exeList consumes large amounts of CPU resources, Dynimizer will automatically begin to optimize the in memory instructions of that process. Dynimizer only performs work in response to CPU resources consumed by a target process. The more CPU resources consumed by a target process, the more intensely Dynimizer will work to dynimize it and the more quickly the process will become dynimized. A target process that consumes little CPU resources will therefore take a long time to become fully dynimized.

Responds to changes in workload
Once all target applications are sufficiently optimized, Dynimizer will enter idle mode. It will then be prompted to perform more work in response to a significant change in the CPU workload of an already dynimized process, in which case it will re-dynimize it. It will also be prompted to do more work when a new undynimized process begins, where its executable is on the exeList and the process is CPU intensive.

Zero downtime
Target applications do NOT need to be restarted in order to be dynimized. Once Dynimizer is started it will automatically detect and begin dynimizing them immediately.

One Dynimizer instance per OS host
Dynimizer is not designed to be run as multiple instances on the same host OS. If a second instance is launched in parallel, it will detect an already running instance of Dynimizer and exit.

No handoffs between Dynimizer instances
A new Dynimizer process will not dynimize a target process that has already been dynimized by a previous Dynimizer process.

Single threaded
Dynimizer is currently single threaded and will only consume the resources of at most one CPU core while running.



9. Example Lifecycle of Dynimizer and a Target Application

  1. Dynimizer is launched and begins in idle mode, monitoring system processes. This state consumes virtually no CPU resources:

       $ dyni -start


  2. A new target application process such as mysqld is then launched and becomes CPU intensive, or an already running target application becomes CPU intensive.

  3. Dynimizer detects this and begins to dynimize the target application. Incremental, atomic updates to the target application's machine code are made. If the target application remains CPU intensive then this optimization typically takes around 60 seconds to complete:

       $ dyni -status


       Dynimizer is running
    mysqld, pid: 8375, dynimizing


  4. Dynimizer has completed its current batch of optimization work and enters idle mode:

       $ dyni -status


       Dynimizer is running
    mysqld, pid: 8375, dynimized


  5. A new target application is launched or the workload of the previously dynimized process has drastically changed. In either case, Dynimizer returns to step 3. Although Dynimizer is single threaded, it can apply these steps to multiple target processes at the same time.


10. Dynimizer Overheads

Dynimizer introduces both CPU and memory overheads when ramping up performance during the dynimizing phase.* The following section addresses these overheads.


10.1 Dynimizer CPU overhead

CPU overhead exists during the warm-up phase when a process is being Dynimized. There are two components of CPU overhead during this phase. The most obvious is the amount of CPU cycles that the dyni process is actually consuming. While at first spiking to 100% utilization of a single core for less than a second, dyni typically fluctuates at around 20% CPU utilization (of a single core) for the remainder of the warm-up phase. Because it is quite rare to be fully utilizing all hardware threads on a large multicore system, the single threaded dyni process is unlikely to make too much of an impact in this manner. The second CPU performance overhead that takes place is that of application profiling, and although brief, it is typically far more drastic. Both these overheads are offset by the gradual machine code optimizations that take effect, and are completely eliminated once the process reaches the dynimized phase. An initial warmup period should therefor be set aside for workloads using Dynimizer.

10.2 Dynimizer Memory overhead

Memory overhead also exists, where the dyni process typically requires around 4 GB of virtual memory for each target process being dynimized, while it is dynimizing them. This large amount of virtual memory is used for book keeping purposes when dynimizing, however the dynamic range of memory accesses is quite limited and in our experience does little to trigger additional page faults in memory constrained workloads. That said, at least 4GB of free swap space is required for each dynimized process during the dynimizing phase, and even more may be allocated to provide for a margin of safety. Once a processes is dynimized, most of that memory is freed. The dyni process then typically consumes 50-150 MB of virtual memory per dynimized process. This is used for book keeping purposes in order to react to drastic workload changes in dynimized processes. This memory also undergoes a very small range of dynamic accesses, and should therefore have negligible impact on system paging. That steady state memory overhead can be eliminated all together by using the -optimizeOnce:y option with Dynimizer. Without that option, if a drastic workload change occurs and the target processes are re-dynimized, they will return to the dynimizing phase, in which case dynimizer will briefly incur the 4 GB virtual memory overhead per target process again. An additional use of memory is the 35 MB code cache that gets loaded into the target process when dynimized. However the actual access patterns of machine code instruction memory with the addition of the code cache are more constrained than that of the original undynimized process, and so the resident memory pages used there should be less in most cases.

Overall, because of the limited range of dynamic memory accesses, these memory overheads should not affect performance in memory constrained environments. Therefore more RAM is typically not required. However one should make sure that the swap space can accommodate the increased virtual memory used during the dynimizing phase.

* Reductions in both the CPU and memory overheads incurred by Dynimizer are planned for future releases.



11. Using Dynimizer on Applications Other Than MySQL

Many have asked us why MySQL is the main target for the initial release of Dynimizer. Database workloads are often IO bound and do not benefit from improved CPU performance to the same extent as other workloads. Extreme levels of instruction cache misses are common in many enterprise software workloads, which leaves many places where Dynimizer could be put to productive use in the future. However a lot of these other examples such as Redis or NGINX spend most of this CPU time in system mode execution (in the Linux kernel). The machine code underlying this system mode execution cannot be optimized by Dynimizer in the current release and therefore these applications benefit to a lesser extent than MySQL. We've measured around 10% speedups with Nginx in a CPU bound setup for example. Other applications are often configured in a multiprocess configuration where there may be too many processes for Dynimizer to handle effectively, or where the processes have short lifetimes. For these reasons, we've spent little time examining other use cases for now.

Future versions of Dynimizer will be able to optimize workloads with these characteristics too. At the moment, MySQL and its variants have been the main focus of the current release, with other single process, multithreaded relational databases on Linux being the most likely candidates to benefit from this release in our opinion. Because this is such new technology, we've decided to limit the scope of this release to focus on delivering a stable user experience when improving the performance of a single, highly popular family of programs: MySQL, MariaDB and Percona Server. Stay tuned for expanded scope in future releases.

If you find Dynimizer useful in other workloads outside of MySQL, we'd love to hear about it. Let us know at apps@dynimize.com.



Questions & Feedback

We love answering questions and your feedback is extremely valuable! Please use this comment form for anything related to this documentation.














COPYRIGHT © DYNIMIZE INC.