Database Server Performance results

Freedom Services 2.0 Archive Validated

The  following results were measured on a 250 000 records database with 32 byte keys and 2048 byte records. I have run my tests using two access modes: multiple open and single open mode. Multi-open mode means the a connection is made to the DB server for each fetch query. Single open mode means connecting once and then perform all DB operations.

Local DB performance on attack (10.32.0.130)

The local database can provide up to 8000 fetches/sec using only 1 dbperf process. I was able to reach over 12 000 fetches/sec using many dbperf processes. Note that the Sun needs to "warm up" before it can reach that level. When I started my first tests I got around 150 fetches/sec. This is not due to the database cache because we are not using a huge cache. There seems to be another system cache that make performance go up drastically. This probably requires some investigation...

Performance with clients on 10.16 subnet

Here are the results I had with 'dbperf' on igor, god, colm, thomas and francisl. These machines are not on the same subnet and go thorough a firewall to reach attack so the expected network performance is really low. Note that there was a bug in my script and I don't have numbers for multiple dbperf processes on 1 machine so all tests were run with only one dbperf process per machine. Note that there were 16 db server processes on Sun.
 
Opening Mode 1 machine 2 machines 3 machines 4 machines 5 machines
Single Open 55 queries/sec 110  (2 * 55) 135 (3 * 45) 160 (4 * 40)  175 (5 * 35)
Multi Open N/A N/A N/A N/A N/A

As you can see the numbers are not really good.  I had better results (over 200 fetches/sec) with fishbowl and my machine so we should get which better results with attack. One reason is probably the network configuration that we have between the test clients I used and attack. I asked Gerard to lend me machines closer to attack when they will be ready. These results will better reflect the production network configuration.

I don't have number for the multi-open mode but if I try that right now (lunch time) I get around 35 queries/sec for one machine. I will run some form test for multi-open mode when I get a better network configuration.

Performance with clients (A6 and A7) on 10.32 subnet

I ran some test with the new machines (A6 an A7) located on the same subnet as attack (i.e. 10.32). The tests I performed were with multiple DB opening and using 5, 10 and 20 database server processes. Here are the results
 
Number of child  process per  client machine
Number of DB server chil process Number of clients machines 1 2 4 8 16 32
5 1 170 290 405 630 700 700
5 2 330 
(or 2 * 165)
540 830 1020 1020 unstable:
ranges from 1045 to 1410
10 1 175 295 485 650 710 710
10 2 330 530 820 1060 1120 unstable: ranges from 1173 to 1425
20 1 155 270 430 630 700 725
20 2 300 470 740 1020 unstable: ranges from 1120 to 1375 unstable: ranges from 1110 to 1440

DB server with test tool on Sun

Just to determine the load that the zkDB server can handle without network connection cost, I ran some local test with dbperf running on the Sun and connecting to the server also on the Sun. Here are some results using 16 database server processes:
 
Opening Mode 1 process 2 16  32 64 128
Single Open 1000 queries/sec 1700 1900 2000 1900 7200 (???) 14 000 (???) 28000 (???)
Multi Open 390 490 590 670 670 650

I also experimented with 8 DB server processes on Sun and the results are very similar. 16 processes seemed to performed better even though I would need to let it run and stabilize more to really measure the difference (the estimated delta is within within 100 fetches/sec).

Note that because the db server and the dbperf tool are running on the same machine they compute for CPU (which is fully used) and this probably degrades performance.

When last note with regards the excessive results obtained with 32, 62, 128 and even 256 processes. The only explanation I could think of is that  it may be caused by the Sun OS scheduler. Correct me if I'm wrong but I believe that the libzkperf library is written such that forked process do not start as soon as they are forked but wait for the parent to tell them when to start after all processes are forked. This should provide simultaneous  run of the child processes. However, with high number of process, there may be some processes that do not get a chance to start even if the parent told them so because the scheduler may not have run them yet (and so their timer is not started). I find this strange but with 16 DB processes and 32 (or more) client processes maybe some processes don't run until some are finished ? Any other explanation ?