mySQL benchmarking & tuning with sysbench SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are of interest for a system running a database under intensive load.
SysBench was originally designed to test parameters such as file I/O performance, scheduler performance, memory allocation and transfer speed and POSIX thread implementation performance. SysBench allows a tester to configure the number of threads, the amount of data in the database, the access pattern, and whether the database is read-only, read-mostly, or read-write.
Sysbench allows to test:
file I/O performance
scheduler performance
memory allocation and transfer speed
POSIX threads implementation performance
database server performance Download: http://sysbench.sourceforge.net/
or use "yum install sysbench" on Fedora LinuxBenchmark System InformationHP HDX 16 Laptop Fedora 11 Linux System[root@miranda]# cat /proc/versionLinux version 2.6.29.6-217.2.8.fc11.x86_64 (
mockbuild@x86-5.fedora.phx.redhat.com) (gcc version 4.4.0 20090506 (Red Hat 4.4.0-4) (GCC) ) #1 SMP Sat Aug 15 01:06:26 EDT 2009
[root@miranda]# cat /proc/cpuinfoprocessor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz
stepping : 6
cpu MHz : 800.000
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority
bogomips : 4522.73
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz
stepping : 6
cpu MHz : 800.000
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority
bogomips : 4521.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
[root@miranda]# cat /proc/meminfoMemTotal: 4018880 kB
MemFree: 44640 kB
Buffers: 346632 kB
Cached: 2647696 kB
SwapCached: 0 kB
Active: 1154460 kB
Inactive: 2353600 kB
Active(anon): 280900 kB
Inactive(anon): 238392 kB
Active(file): 873560 kB
Inactive(file): 2115208 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 6078456 kB
SwapFree: 6078456 kB
[root@miranda]# hdparm -tT /dev/sda # perform test read benchmark on a hard drive. /dev/sda:
Timing cached reads: 3552 MB in 2.00 seconds = 1777.32 MB/sec
Timing buffered disk reads: 190 MB in 3.01 seconds = 63.02 MB/sec
[root@miranda]# hdparm -i /dev/sda # Display hard drive information/dev/sda:
Model=FUJITSU, FwRev=8909, SerialNo=K618T8A2REK3
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-3,4,5,6,7
* signifies the current active mode
SYSBENCH RESULTS:4.1 Sysbench CPU benchmarkThe cpu is one of the most simple benchmarks in SysBench. In this mode each request consists in calculation of prime numbers up to a value specified by the --cpu-max-primes option. All calculations are performed using 64-bit integers.
Each thread executes the requests concurrently until either the total number of requests or the total execution time exceed the limits specified with the common command line options.
Example:
sysbench --test=cpu --cpu-max-prime=20000 run [root@miranda]# sysbench --test=cpu --cpu-max-prime=20000 run # Do sysbench CPU linux benchmark
sysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing CPU performance benchmark
Threads started!
Done.
Maximum prime number checked in CPU test: 20000
Test execution summary:
total time: 27.8965s
total number of events: 10000
total time taken by event execution: 27.8881
per-request statistics:
min: 2.76ms
avg: 2.79ms
max: 8.26ms
approx. 95 percentile: 2.82ms
Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 27.8881/0.00
4.2. Sysbench threading benchmarkThis test mode was written to benchmark scheduler performance, more specifically the cases when a scheduler has a large number of threads competing for some set of mutexes.
SysBench creates a specified number of threads and a specified number of mutexes. Then each thread starts running the requests consisting of locking the mutex, yielding the CPU, so the thread is placed in the run queue by the scheduler, then unlocking the mutex when the thread is rescheduled back to execution. For each request, the above actions are run several times in a loop, so the more iterations is performed, the more concurrency is placed on each mutex.
The following options are available in this test mode:
Option Description Default value
--thread-yields Number of lock/yield/unlock loops to execute per each request 1000
--thread-locks Number of mutexes to create 8
Example:
sysbench --num-threads=64 --test=threads --thread-yields=100 --thread-locks=2 run [root@miranda]# sysbench --num-threads=64 --test=threads --thread-yields=100 --thread-locks=2 runsysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 64
Doing thread subsystem performance test
Thread yields per test: 100 Locks used: 2
Threads started!
Done.
Test execution summary:
total time: 0.4177s
total number of events: 10000
total time taken by event execution: 26.2549
per-request statistics:
min: 0.05ms
avg: 2.63ms
max: 303.37ms
approx. 95 percentile: 2.05ms
Threads fairness:
events (avg/stddev): 156.2500/101.88
execution time (avg/stddev): 0.4102/0.00
4.3. Mutex threading benchmarkThis test mode was written to emulate a situation when all threads run concurrently most of the time, acquiring the mutex lock only for a short period of time (incrementing a global variable). So the purpose of this benchmarks is to examine the performance of mutex implementation.
The following options are available in this test mode:
Option Description Default value
--mutex-num Number of mutexes. The actual mutex to lock is chosen randomly before each lock 4096
--mutex-locks Number of mutex locks to acquire per each request 50000
--mutex-loops Number of iterations for an empty loop to perform before acquiring the lock 10000
Example:
sysbench --test=mutex run # use default test values
[root@miranda]# sysbench --test=mutex runsysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing mutex performance test
Threads started!
Done.
Test execution summary:
total time: 0.0154s
total number of events: 1
total time taken by event execution: 0.0148
per-request statistics:
min: 14.79ms
avg: 14.79ms
max: 14.79ms
approx. 95 percentile: 10000000.00ms
Threads fairness:
events (avg/stddev): 1.0000/0.00
execution time (avg/stddev): 0.0148/0.00
4.4 Sysbench Memory BenchmarkThis test mode can be used to benchmark sequential memory reads or writes. Depending on command line options each thread can access either a global or a local block for all memory operations.
The following options are available in this test mode:
Option Description Default value
--memory-block-size Size of memory block to use 1K
--memory-scope Possible values: global, local. Specifies whether each thread will use a globally allocated memory block, or a local one. global
--memory-total-size Total size of data to transfer 100G
--memory-oper Type of memory operations. Possible values: read, write. 100G
[root@miranda mastajax]# sysbench --test=memory runsysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing memory operations speed test
Memory block size: 1K
Memory transfer size: 102400M
Memory operations type: write
Memory scope type: global
Threads started!
Done.
Operations performed: 104857600 (332053.77 ops/sec)
102400.00 MB transferred (324.27 MB/sec)
Test execution summary:
total time: 315.7850s
total number of events: 104857600
total time taken by event execution: 251.8395
per-request statistics:
min: 0.00ms
avg: 0.00ms
max: 7.03ms
approx. 95 percentile: 0.00ms
Threads fairness:
events (avg/stddev): 104857600.0000/0.00
execution time (avg/stddev): 251.8395/0.00
4.5. Filesystem I/O performanceThis test mode can be used to produce various kinds of file I/O workloads. At the prepare stage SysBench creates a specified number of files with a specified total size, then at the run stage, each thread performs specified I/O operations on this set of files.
When the global --validate option is used with the fileio test mode, SysBench performs checksums validation on all data read from the disk. On each write operation the block is filled with random values, then the checksum is calculated and stored in the block along with the offset of this block within a file. On each read operation the block is validated by comparing the stored offset with the real offset, and the stored checksum with the real calculated checksum.
The following I/O operations are supported:
seqwr
sequential write seqrewr
sequential rewrite seqrd
sequential read rndrd
random read rndwr
random write rndrw
combined random read/write Usage example:
$ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw prepare
$ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw run
$ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw cleanup In the above example the
first command creates 128 files with the total size of 3 GB in the current directory, the
second command runs the actual benchmark and displays the results upon completion, and the
third one removes the files used for the test.
First run the prepare to create test files:
[root@miranda]# sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw prepareThen run fileio test:
[root@miranda]# sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw runsysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 16
Extra file open flags: 0
128 files, 24Mb each
3Gb total file size
Block size 16Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Done.
Operations performed: 5998 Read, 4002 Write, 12800 Other = 22800 Total
Read 93.719Mb Written 62.531Mb Total transferred 156.25Mb (2.7834Mb/sec)
178.14 Requests/sec executed
Test execution summary:
total time: 56.1367s
total number of events: 10000
total time taken by event execution: 266.5906
per-request statistics:
min: 0.01ms
avg: 26.66ms
max: 485.40ms
approx. 95 percentile: 160.38ms
Threads fairness:
events (avg/stddev): 625.0000/115.76
execution time (avg/stddev): 16.6619/2.06
Don't forget to clean up the files that were created during the prepare:
sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw cleanupThat's all I will post for now. I will post benchmarks without any GUI loaded. Feel free to post your results. I will keep the thread updated with additional Sysbench linux benchmarks.