NI Servers
Performance Test Plan (NOTES)
Copyright (c)
Zero-Knowledge Systems Inc., 2000
About These Notes
This
is a very small overview of the requirements for performance testing of the NI servers.
It is no great piece of literature, but hopefully explains the minimum that
should be done to know and understand the actual capacities of the servers. We
encourage you to read the following notes before running your tests: they
contain information that will aid your understanding of the components.
Prerequisites
The following
hardware and software are required:
Hardware:
Ø
1 machine for the NIQ and NIS servers,
Ø
And at least 3 machines for the client test application
NOTE: This is the minimal number of machines that we
need in order to properly test each NI server in a production like environment.
This does not include other machines that may be running core servers that are
required for the Freedom system to run properly
Ø
Different connection types:
·
V.90 modem (56K)
·
Ethernet LAN card (100 Mbits)
Software:
Ø
Latest version of the NIQS and NISS,
Ø
And the latest version of the NIQSPERF
Introduction
The NIQS acts as a daemon for the
various Freedom entities requesting Freedom network information. The main
client of the NIQ server is the Freedom client application. The load of the
Freedom client on the NIQS is very important compared with the load caused by
the nodes and the core servers. The transactions of the Freedom client can be
divided in two categories:
Ø
Normal startup procedure,
Ø
Full update,
Ø
And other requests
Among these
categories, the full update transactions are the most important one to consider
during testing. There is a linear relation between the number of users we are
able to support and the number of full updates per second. The load the Freedom
client adds on the NIQS is likely to increase in the future for the following
reasons:
Ø
We are expecting to multiply the number of users,
Ø
And we are expecting to increase the number of nodes
Both will
amplify the number of connections and the number of requests to the NIQS. To be
able to answer this raise, a new command has been added to the NIQS to improve
its performance. This command allows the client to send a single request
instead of multiple requests done in the previous version. This way, we
decrease the connection latency allowing more users to connect to the NIQS at a
giving time. The objective of this performance test is to assess its new
capacity. In the next paragraph we mention some important technical details
that you should know before running your tests. Those details should be
considered during testing.
On the other hand, the NISS acts as
status gathering daemon for the various Freedom entities in the Freedom
network. It accepts incoming UDP stat packets and updates the state database
using the information it receives. Things are simpler for the NISS, since we
only need to figure out how many statistic updates per second it can process.
Technical Details To Consider
NIQS uses a pre-forking model server. The
parent process is responsible only for forking child processes, it does not
serve any requests or service any network sockets. The child processes actually
processes connections; they serve multiple connections (one at time) before
dying. The parent spawns maxConnections new children at start up and
replaces dead child. As you know, the maxConnections setting represents
the limit of simultaneous connections the NIQS may handle. Varying the maxConnections
setting should affect the performance. The lost of performance is cause by the
overhead of forking process, the overhead of context switches between
processes, and the memory overhead of having multiple processes. The single
biggest hardware issue affecting public server performance is RAM. A public
server, such as NIQS, should never have to swap; swapping increases the latency
of each request.
The optimal value for maxConnections
to obtain the best performance should be determined by experimentation. A test
application must be run with different values in order to find the maximum of
queries per second we can get.
If you get a lot of error messages about
running out of file handles, you might want to raise the limit of file-max. The
value in file-max denotes the maximum number of files handles that the Linux
kernel will allocate. The default value is 4096. To change it, just write the
new number into the file:
# cat
/proc/sys/fs/file-max
4096
# echo 8192 >
/proc/sys/file-max
# cat
/proc/sys/fs/file-max
8192
The three values in file-nr denote the
number of allocated file handles, the number of used file handles, and the
maximum of file handles. When the allocated file handles come close to the
maximum, but the number of actually used ones is far behind, you’ve encountered
a peak in your usage of file handles and you don’t need to increase the
maximum. Taking in note the number of used files handles during performance
testing is a good idea. It will give us the right number of file handles we
need to run the NIQS properly in the production network.
However, there is still a per process
limit of open files, which unfortunately can’t be changed that easily. It is
set to 1024 by default. To change this you have to edit the files limits.h and
fs.h in the directory /usr/src/linux/include/linux. Change the definition of
NR_OPEN and recompile the kernel.
Related to process creation is process
death induced by the processLife setting. By default this is 0, which
means that there is no limit to the number of requests handled per child. If
your configuration currently has this set to some very low number, such as 50
for example, you may want to bump this up significantly. Limit this to 10000 or
so because of memory leaks. Having a very low number for this parameter can
introduce drastic effects on the benchmark results. If the machine is busy
spawning children it can't service requests. This is an important factor to
considerate during testing activities.
Another important factor to mention is the
connection latency. The parameters that affect the connection latency are the
following:
Ø Client access speed
Ø Internet and Freedom latencies
Ø The time it takes to the NIQS to generate
the response when the cache expired
This is illustrated in the figure below:
Performance Parameters
The NIQ and
NIS servers have different performance parameters that must be measured to
determine the server’s performance level. These parameters are described in the
table below:
Parameter |
Description |
“maxConnections” |
The maxConnections setting define
the maximum number of simultaneous connections the NIQ server can process.
This parameter is set to 20 for the production network. Since we are pretty
sure that the NIQS is not CPU bound; this value should be increased
significantly. |
“processLife” |
The maximum
number of requests handled per child. This setting should be set to a high
constant value. Set the processLife setting to 10000 for testing. |
Database
size |
We would
like to understand here the influence of the number of nodes in the network
on the compression factor. Is the response length, in bytes, increases considerably
when we add nodes to the network? This should be test by varying the number
of nodes entered in the database. Let say 50, 100, 200, 400, and 800 nodes.
Preferably the information for each node has to be random values; otherwise
you will get biased result. |
Time To Live
(TTL) |
The TTL
value is the expiration time of the NIQS’s cache. The NIQ server uses a cache
mechanism in order to increase the number of requests per second. This cache
needs to be refreshed periodically. This frequency is defined by the TTL
settings. |
Client
access speed |
To ascertain
the server’s performance level, all tests should be performed with different
connection types. This is one of the most important parameter to take into
account to obtain pertinent results. If the hardware is not available, delays
can be inserted between each request by setting the latency parameters of the
NIQSPERF. |
Internet and
Freedom latencies |
This
represents the connection latency between the client and the NIQS. These
parameters are the most difficult one to evaluate. Once again, life is still
being a bitch. The latency should be considered as a variable during
performance evaluation. |
Tests Performed
As you have seen, a lot of parameters influence the NIQS’s
performance. This gives us over hundred different test cases to be considered.
TO CONTINUE …
Assumptions
Here are the
assumptions we made to simulate a 56K connection:
56 kbps |
|
20 ms / 100
bytes |
|
|
3 hops: |
Min. |
60 ms |
|
|
|
Max. |
800 ms |
|
|
AIP's overhead |
30 ms |
|
|
|
Cryptography
disabled |
|
|
||
|
|
|
|
|
NOTE: TCP and IP headers are not taken into
account |
From these
assumptions we obtained the following latencies for the NIQSPERF’s configuration
file.
Connection
latency, in milliseconds
minconnlat = 110
maxconnlat = 850
KQD
latencies, in milliseconds
minkqdlistlat = 130
maxkqdlistlat = 870
minkqdquerylat = 190
maxkqdquerylat = 930
Cache Data
Version latency, in milliseconds
mincdvlat = 300
maxcdvlat = 1780
Client Info
latency, in milliseconds
mincilat = 310
maxcilat = 1050
Token Info
latency, in milliseconds
mintilat = 310
maxtilat = 1050
MxQuery
latency, in milliseconds
minmxqlat = 810
maxmxqlat = 1650
Old Full Update
latencies, in milliseconds
minfullupdlistlat = 110
maxfullupdlistlat = 850
minfullupdquerylat = 130
maxfullupdquerylat = 870
NIQS’s Results (with
MxQuery Enabled)
NIQS
statistics for the last 5 minutes |
|
NIQS statistics
for the last 5 minutes |
|
||||||
Total
connections |
|
|
13786 |
Total
connections |
|
|
18559 |
||
New children |
|
|
100 |
New children |
|
|
150 |
||
CacheDataVersion
request |
|
13672 |
CacheDataVersion
request |
|
18558 |
||||
CacheDataVersion
request [HIT] |
|
13666 |
CacheDataVersion
request [HIT] |
|
18549 |
||||
ClientInfo
requests |
|
|
13654 |
ClientInfo
requests |
|
|
18605 |
||
ClientInfo
requests [HIT] |
|
13648 |
ClientInfo
requests [HIT] |
|
18597 |
||||
List requests |
|
|
13683 |
List requests |
|
|
18559 |
||
ListSince0
requests |
|
|
13683 |
ListSince0
requests |
|
|
18559 |
||
ListSince0
requests [HIT] |
|
13679 |
ListSince0
requests [HIT] |
|
18558 |
||||
MxQueryDescription
requests |
|
13640 |
MxQueryDescription
requests |
|
18584 |
||||
MxQueryDescriptionSince0
requests |
13640 |
MxQueryDescriptionSince0
requests |
18584 |
||||||
MxQueryDescriptionSince0
requests [HIT] |
13492 |
MxQueryDescriptionSince0
requests [HIT] |
18424 |
||||||
QueryDescription
requests |
|
13678 |
QueryDescription
requests |
|
18553 |
||||
QueryDescription
requests [HIT] |
|
13672 |
QueryDescription
requests [HIT] |
|
18547 |
||||
TokenInfo
requests |
|
|
13644 |
TokenInfo
requests |
|
|
18597 |
||
TokenInfo
requests [HIT] |
|
13638 |
TokenInfo
requests [HIT] |
|
18592 |
||||
Simultaneous
connections: |
|
|
Simultaneous
connections: |
|
|
||||
|
Average |
|
|
96.6 |
|
Average |
|
|
149 |
|
Minimum |
|
|
0 |
|
Minimum |
|
|
126 |
|
Maximum |
|
|
100 |
|
Maximum |
|
|
150 |
Commands/connection: |
|
|
Commands/connection: |
|
|
||||
|
Average |
|
|
6 |
|
Average |
|
|
6 |
|
Minimum |
|
|
0 |
|
Minimum |
|
|
6 |
|
Maximum |
|
|
6 |
|
Maximum |
|
|
6 |
Time
(sec)/connection |
|
|
Time
(sec)/connection |
|
|
||||
|
Average |
|
|
2.4 |
|
Average |
|
|
2.4 |
|
Minimum |
|
|
0 |
|
Minimum |
|
|
2 |
|
Maximum |
|
|
7 |
|
Maximum |
|
|
8 |
|
|
|
|
|
|
|
|
|
|
File Descriptors |
|
|
4467 |
File
Descriptors |
|
|
7545 |
||
Memory
used |
|
|
112073K |
Memory
used |
|
|
117724K |
||
Network
Interface (bps) |
|
|
Network
Interface (bps) |
|
|
||||
|
Received |
|
|
72707 |
|
Received |
|
|
146349 |
|
|
|
|
105508 |
|
|
|
|
135536 |
|
|
|
|
83025 |
|
|
|
|
142179 |
|
Sent |
|
|
394430 |
|
Sent |
|
|
435657 |
|
|
|
|
253479 |
|
|
|
|
633067 |
|
|
|
|
338294 |
|
|
|
|
421241 |
|
|
|
|
|
|
|
|
|
|
NIQS statistics
for the last 5 minutes |
|
NIQS
statistics for the last 5 minutes |
|
||||||
Total
connections |
|
|
21146 |
Total
connections |
|
|
18996 |
||
New children |
|
|
175 |
New children |
|
|
200 |
||
CacheDataVersion
request |
|
20765 |
CacheDataVersion
request |
|
18665 |
||||
CacheDataVersion
request [HIT] |
|
20760 |
CacheDataVersion
request [HIT] |
|
18660 |
||||
ClientInfo
requests |
|
|
20734 |
ClientInfo
requests |
|
|
18639 |
||
ClientInfo
requests [HIT] |
|
20729 |
ClientInfo
requests [HIT] |
|
18632 |
||||
List requests |
|
|
20767 |
List requests |
|
|
18698 |
||
ListSince0
requests |
|
|
20767 |
ListSince0
requests |
|
|
18698 |
||
ListSince0
requests [HIT] |
|
20765 |
ListSince0
requests [HIT] |
|
18697 |
||||
MxQueryDescription
requests |
|
20668 |
MxQueryDescription
requests |
|
18592 |
||||
MxQueryDescriptionSince0
requests |
20668 |
MxQueryDescriptionSince0
requests |
18592 |
||||||
MxQueryDescriptionSince0
requests [HIT] |
20424 |
MxQueryDescriptionSince0
requests [HIT] |
18490 |
||||||
QueryDescription
requests |
|
20754 |
QueryDescription
requests |
|
18678 |
||||
QueryDescription
requests [HIT] |
|
20747 |
QueryDescription
requests [HIT] |
|
18672 |
||||
TokenInfo
requests |
|
|
20702 |
TokenInfo
requests |
|
|
18610 |
||
TokenInfo
requests [HIT] |
|
20695 |
TokenInfo
requests [HIT] |
|
18604 |
||||
Simultaneous
connections: |
|
|
Simultaneous
connections: |
|
|
||||
|
Average |
|
|
169 |
|
Average |
|
|
181.3 |
|
Minimum |
|
|
32 |
|
Minimum |
|
|
0 |
|
Maximum |
|
|
175 |
|
Maximum |
|
|
200 |
Commands/connection: |
|
|
Commands/connection: |
|
|
||||
|
Average |
|
|
5.9 |
|
Average |
|
|
5.9 |
|
Minimum |
|
|
0 |
|
Minimum |
|
|
0 |
|
Maximum |
|
|
6 |
|
Maximum |
|
|
6 |
Time
(sec)/connection |
|
|
Time
(sec)/connection |
|
|
||||
|
Average |
|
|
2.4 |
|
Average |
|
|
2.4 |
|
Minimum |
|
|
0 |
|
Minimum |
|
|
0 |
|
Maximum |
|
|
11 |
|
Maximum |
|
|
6 |
|
|
|
|
|
|
|
|
|
|
File
Descriptors |
|
|
7604 |
File
Descriptors |
|
|
8610 |
||
Memory
used |
|
|
117933K |
Memory
used |
|
|
123088K |
||
Network
Interface (bps) |
|
|
Network
Interface (bps) |
|
|
||||
|
Received |
|
|
139614 |
|
Received |
|
|
157293 |
|
|
|
|
152246 |
|
|
|
|
169469 |
|
|
|
|
156085 |
|
|
|
|
161543 |
|
Sent |
|
|
539895 |
|
Sent |
|
|
867212 |
|
|
|
|
601189 |
|
|
|
|
754622 |
|
|
|
|
619104 |
|
|
|
|
647632 |
|
|
|
|
|
|
|
|
|
|
NIQS
statistics for the last 5 minutes |
|
|
|
|
|
|
|||
Total
connections |
|
|
26606 |
|
|
|
|
|
|
New children |
|
|
250 |
|
|
|
|
|
|
CacheDataVersion
request |
|
26335 |
|
|
|
|
|
||
CacheDataVersion
request [HIT] |
|
26328 |
|
|
|
|
|
||
ClientInfo
requests |
|
|
26306 |
|
|
|
|
|
|
ClientInfo
requests [HIT] |
|
26301 |
|
|
|
|
|
||
List requests |
|
|
26321 |
|
|
|
|
|
|
ListSince0
requests |
|
|
26321 |
|
|
|
|
|
|
ListSince0
requests [HIT] |
|
26319 |
|
|
|
|
|
||
MxQueryDescription
requests |
|
26254 |
|
|
|
|
|
||
MxQueryDescriptionSince0
requests |
26254 |
|
|
|
|
|
|||
MxQueryDescriptionSince0
requests [HIT] |
25826 |
|
|
|
|
|
|||
QueryDescription
requests |
|
26329 |
|
|
|
|
|
||
QueryDescription
requests [HIT] |
|
26324 |
|
|
|
|
|
||
TokenInfo
requests |
|
|
26308 |
|
|
|
|
|
|
TokenInfo
requests [HIT] |
|
26301 |
|
|
|
|
|
||
Simultaneous connections: |
|
|
|
|
|
|
|
||
|
Average |
|
|
233.9 |
|
|
|
|
|
|
Minimum |
|
|
0 |
|
|
|
|
|
|
Maximum |
|
|
250 |
|
|
|
|
|
Commands/connection: |
|
|
|
|
|
|
|
||
|
Average |
|
|
5.9 |
|
|
|
|
|
|
Minimum |
|
|
0 |
|
|
|
|
|
|
Maximum |
|
|
6 |
|
|
|
|
|
Time
(sec)/connection |
|
|
|
|
|
|
|
||
|
Average |
|
|
2.7 |
|
|
|
|
|
|
Minimum |
|
|
0 |
|
|
|
|
|
|
Maximum |
|
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File
Descriptors |
|
|
10662 |
|
|
|
|
|
|
Memory
used |
|
|
138404K |
|
|
|
|
|
|
Network
Interface (bps) |
|
|
|
|
|
|
|
||
|
Received |
|
|
177187 |
|
|
|
|
|
|
|
|
|
182620 |
|
|
|
|
|
|
|
|
|
229673 |
|
|
|
|
|
|
Sent |
|
|
433139 |
|
|
|
|
|
|
|
|
|
707452 |
|
|
|
|
|
|
|
|
|
928623 |
|
|
|
|
|
First
draft: Stéphane Rhéaume