|







| |
| SSi |
SSL
Performance and Capacity Planning
|
Introduction
SSL accelerators have for a long time had their
performance measured with a single number, namely, RSA operations per
second. This provided a tidy way to compare one accelerator or offloader
product to another, but unfortunately offered little in the way of helping
users to assess the usefulness of a product. While RSA ops per second are
certainly an important performance metric, they are only one-third of the
total performance picture that should be considered when evaluating
accelerators. A complete performance evaluation should include:
Most accelerators, including first generation
offerings, perform a minimum of 200 RSA operations per second. By today’s
standards, that does not seem capacious, but consider that 200 operations
per second equates to over 17 million operations per day, and then even that
modest number might seem excessive; how much more so when the average
accelerator today performs 800 or more RSA operations per second, or over 69
million operations per day?
While it is not inconceivable for a site to receive
tens of millions of hit a day, it is difficult to imagine that such a site
would not have a highly fault tolerant configuration. Implicit in fault
tolerance is redundancy, preferably at every level, including routers,
switches, load-balancers, and SSL Offloaders. Unlike offloading solutions
that are integrated directly into content-switches, dedicated
appliance-based Offloaders provide the ability to operate in an
active-active mode, offering both fault tolerance and aggregated levels of
performance.

Analyzing Your Site’s Traffic
The best way to recognize the three key performance
metrics of SSL acceleration is to look at the nature of traffic in a high
volume, secure web-site. Take a small to moderately sized hypothetical
e-business called Intensifynance.com.
Intensifynance.com hosts and provides online
brokerage services, as well as access to extensive financial reports.
Additionally, they host servers providing banner advertisements that appear
on all of their pages.
The three common measurements of web-site activity are:
 | User-Sessions - the actual number of unique visitors |
 | Impressions - the number of HTML pages loaded per
user session |
 | Hits - the number of requests made of the
web-servers per impression |
|
User Sessions Per Day |
150,000 |
|
Impression Per User-session |
9 |
|
Hits (Elements) Per Impression |
7 |
|
User Session Duration |
15 Minutes |
|
Hours of Peak Engagement |
17 Hours |
|
User Sessions/Hour (150,000/17) |
8,824 |
|
User Sessions/Minute (8,824/60) |
147 |
|
User Session Concurrency (147*15) |
2,206 |
|
Burst User Session Concurrency (2,206*3) |
6,618 |
|
Total Daily User Session Data Transfer |
108 Gigabytes |
|
Session Megabits/Sec |
14.2 Megabits |
|
Ads Per Day (150,000*9*4) |
5.4 Million |
|
Ad Hits/Hour (5.4 Million/17) |
318,000 |
|
Ad Hits/Minute (318,000/60) |
5,294 |
|
Ad Hits/Second (5,294/60) |
88 |
|
Ad Impressions/Minute (5,294/4) |
1,324 |
|
Ad Impressions/Second (88/4) |
22 |
|
# Daily Report Downloads |
15,000 |
|
Average Report Size |
5 Megabytes |
|
Total Report Data Transferred |
75 Gigabytes |
|
Reports Megabits/Sec (1.23*8) |
10 Megabits |
|
# of Application Web Servers |
20 |
|
# of Ad Servers |
10 |
|
Sustained Bandwidth Consumption |
25 Megabits |
|
Burst Bandwidth Consumption (25*3) |
75 Megabits |
|
Total Hits Per Day |
9.5 Million |
|
Hits Per Minute (9.5M/17/60) |
9,265 |
|
Hits Per Second |
154 |
|
Burst Hits Per Second |
462 |
Table 1. Intensifynance.com’s Traffic Characteristics.
Intensifynance.com
hosts approximately 150,000 unique user-sessions per day on 30 x P4 1.6 GHz
servers with 512MB RAM as their application servers, and 10 P4 1.4 GHz
servers with 512MB RAM as their ad servers. Each user-session comprises an
average of 9 impressions or unique page loads, and each page contains an
average of 7 elements (including 4 ad images). This equates to approximately
9.5 million hits per day (150,000 x 9 x 7 = 9,450,000). Each of the 9 pages
contains an average of 20K of HTML, 40K of ad images, and 20K of other image
data for a total of 80K per page. Once a user has authenticated to their
services, all user-sessions are HTTPS. 95% of their traffic occurs during
their peak hours: the 17 hour stretch between 6am and 11pm PST. Burst traffic loads can reach up to 3 times average
traffic loads.
Characteristically, the banner
advertisements are very small, about 10K each, and an average of four ad
images load with every page. Approximately 5.4 million total banner ad
images are served daily. To maintain contextual consistency with their
secure pages, all banner ads are linked via HTTPS.
The online financial reports that they
offer are in PDF format, and can
range in size from 500K up to several megabytes. The average size is
estimated to be 5 megabytes. Approximately 15,000 such reports are
downloaded securely via HTTPS daily.
Finally,
Intensifynance.com’s core business—their online services— constitutes
the vast majority of the site’s 150,000 daily visits. The average
brokerage-service user-session (combined average of quotes and trades) lasts
approximately 15 minutes, and results in approximately 720K of transferred
data (9 pages x 80K per page).
Cookie-based session persistence is employed to keep users attached to
the same application server through the life of a user-session.
User-Sessions and Flows
Considering that most traffic today is
HTTP/1.1 as opposed to HTTP/1.0, is it fair to assume that
HTTP keep-alives will typically be used. HTTP keep-alives allow for TCP
sessions (or flows) to be kept open so that multiple pages and elements can
be retrieved from a persistent server without having to establish a separate
TCP session for each retrieval. Prior to keep-alive support, HTTP required
that a separate flow be established for each element or hit. In an SSL
environment, this could have meant (assuming no session reuse) a separate
RSA operation for each hit. In the case of
Intensifynance.com, that could have equaled nearly 10 million RSA
operations per day. With keep-alives, the real number of flows can more
accurately be calculated first by multiplying the number of user-sessions by
the number of impressions or pages (150,000 x 9 = 1,350,000). Next, since
most web-browsers open multiple simultaneous connections (current versions
of Microsoft IE and Netscape open 2 and 4, respectively) we must multiply by
an (averaged) factor of 3 (3 x 1,350,000 = 4,050,000). With a combination of
HTTP keep-alives and SSL session reuse, we can estimate 4 million daily
flows and RSA operations.
Since
Intensifynance.com uses dedicated servers for their ads, separate TCP
sessions must be established to these ad servers, precluding the use of
keep-alives or persistent sessions from the main HTML/app server. This
essentially translates to a doubling of flows since each page visited within
each user-session will invoke an average of 3 browser connections to the ad
server. Our previous estimate of 4 million now doubles to 8 million.
Intensifynance.com
estimates that 10% of its clients do not use HTTP/1.1 compliant browsers or
proxy servers, and instead connect via HTTP/1.0, generally without
keep-alives. This lead to them to add an additional 500,000 short-lived
flows to the total estimate of flows and RSA operations they must support,
arriving at a grand total of 8.5 million flows per day.
|
Flows Per Day |
8.5 Million |
|
Flows Per Hour (8.5M/17) |
500,000 |
|
Flows Per Minute (500,000/60) |
8334 |
|
Flows Per Second (8334/60) |
139 |
Table 2. Flows (TCP Sessions).
The relationship between user-sessions and
flows can be seen as follows:
| |
User Sessions |
Flows |
|
Per Day |
150,000 |
8,500,000 |
|
Per Hour |
8,824 |
500,000 |
|
Per Minute |
147 |
8,333 |
|
Per Second |
2 |
139 |
|
Concurrent |
2,206 |
8,333 |
|
Peak Hour |
26,471 |
1,500,000 |
|
Peak Minute |
441 |
25,000 |
|
Peak Setup/Second |
7 |
417 |
|
Peak Concurrent |
6,618 |
25,000 |
Table 3. Relationship between
User-sessions and Flows. Relating Web Traffic to SSL
Offloader Performance
Correlation
With the site’s raw statistical data
translated into usable flow patterns, we can correlate these numbers directly to
SSL offloaders and the applicability of their performance capabilities.
RSA operations occur when a new SSL session is
established. Although session reuse helps to minimize the number of operations
invoked, reuse is generally only employed by browsers within an existing TCP
session; once the TCP session or flow closes (typically after 15 seconds of
inactivity) a new session must be established, invoking another RSA operation.
Since we know that hits (table 1) do not always translate to unique flows, we
will not use hits to estimate our RSA requirements, but rather we will use Peak
Flow Setups per Second (table 3).
In Intensifynance.com’s
case, the number of Burst Hits per Second (462) and the number of Peak Flow
Setups per Second (417) deviate by only 9%, but this similarity should not be
taken for granted. There are quite a few factors that affect the relationship
between these numbers, with the most profound being the number of distinct
servers to which a user can potentially connect during a session. In our case,
that number is 2 (the one persistent user-session to the application server
itself, and the one user-session to the image server per page); the greater the
number of distinct servers, and the greater the frequency of connection to these
servers, the greater the number of flows.
Concurrent connections correlate directly to
Peak Concurrent Flows (table 3) The considerable difference we see between Peak
Concurrent User-Sessions (6,618) and Peak Concurrent Flows (25,000) exaggerates
somewhat the divergence between these two criteria because a burst multiple of 3
is applied to the base (average) numbers of 2,206 and 8,333. Nonetheless, they
are very much incongruent, and no “rule of thumb” is employable to calculate
their relativity.
Concurrent connections on an offloader refer
to the bi-directional, proxied connection between the user and the offloader,
and the offloader and the origin server. The number of concurrent sessions that
an offloader can support is generally a factor of memory management and overall
networking efficiency of the underlying operating system, and of the SSL proxy
code itself. Dedicated real-time operating systems have a considerable advantage
over generic Unix-like operating systems in this area because they are
specialized rather than generalized in their design, and they provide
predictable process times, virtually eliminating unwanted process degradation.
Sustainable Throughput is the speed at which
an offloader can encrypt and decrypt or “move” data. Unlike other networking
equipment that can offer fixed-throughput measurements, SSL offloaders must
offer variable measurements of sustainable throughput based upon the various
cipher suites that can be negotiated by SSL. A simple 40-bit export cipher is
generally significantly less computationally intensive than 168-bit DES3, and
will result in higher throughputs. Intensifynance.com’s
Burst Bandwidth Consumption (table 1) is estimated to be 75 megabits, but they
have little control over cipher negotiation because for reasons of compatibility
they accept them all. They therefore decided to use the most demanding cipher,
DES3, for measuring sustainable throughput.
Here is where an appreciable difference is
discernable between offloaders. Most offloaders use dedicated acceleration
hardware for RSA operations, but perform bulk-cryptographic operations on their
general-purpose or host CPU. Modern CPU’s—for example, a Pentium III 1GHz—can
sustain DES3 at approximately 45 megabits per second at 100% utilization. In
multi-purpose offloading appliances, such as caches or content-switches with
shared-CPU integrated SSL processing, this means that when faced with high
sustained rates of DES3, or even DES traffic, they will either have to markedly
degrade throughput, or they will have no CPU availability to perform their
“primary” task of switching or caching.
| |
SonicWALL SSL-R |
SonicWALL SSL-RX |
Intel PIII 1GHz |
|
DES3-SHA1 Throughput |
7 Mbps |
54 Mbps |
45 Mbps |
|
RC4-128-MD5 Throughput |
30 Mbps |
100+ Mbps |
100+ Mbps |
Table 4.
Relationship DES3 and RC4 cipher throughputs.
Table 4 shows the throughput for DES3 with SHA1 hashing,
and a commonly negotiated domestic cipher, 128-bit RC4 with MD5 hashing.
Performance figures for the SonicWALL SSL-R and SSL-RX, and a 1GHz Pentium
III are shown. DES3-SHA1 is the strongest cipher currently available within
the SSL framework, and is a FIPS (Federal Information Processing Standards
140-2) requirement for strong encryption in Government applications. It is
also the recommended cipher for long-lived SSL sessions because of the
relative theoretical complexity of “cracking” DES3 compared to other ciphers
such as RC2 or RC4.
Selecting Hardware
Intensifynance.com
decided to employ an n+1 strategy in their site design, where n=the number of
units they need to sustain peak load. This way they can survive a failure of one
of their units without it ever degrading site performance. To recap their
requirements:
 |
RSA Operations per Second: 462
|
 |
Concurrent Connections: 25,000
|
 |
Sustained Throughput (DES3): 75 Mbit
|
The
SonicWALL SSL-R Offloader performs up to 200 RSA operations per second,
supports up to 5,000 concurrent connections, and can sustain 7 Mbits of DES3
throughput. Plugged into their performance and n+1 design requirements, they
would need (12) SonicWALL SSL-R Offloaders. Substituting a 50/50 mix of RC4/DES3
in this calculation would drop their requirement to (9) SonicWALL SSL-R
Offloaders.
The
SonicWALL SSL-RX Offloader supports up to 4,400 RSA operations per second,
up to 30,000 concurrent connections, and sustains 54 Mbit DES3 throughput.
Plugged into their performance and n+1 design requirements, they would need (3)
SonicWALL SSL-RX Offloaders.
In selecting (3) SonicWALL SSL-RX Offloaders,
Intensifynance.com satisfied all of their
performance and redundancy design goals without compromising any of their
requirements or evaluation criteria. Additionally, by introducing SSL offloading
and removing the burden of SSL from their servers, they were able to reduce the
number of servers in their application and image farms from 30 and 10 to 20 and
4, respectively.
|