TUNING THE SORT UTILITY TO ACHIEVE SIGNIFICANT GAINS IN TOTAL
SYSTEM PERFORMANCE
By Patrick Dowling

The author is marketing manager for Phase Linear Systems, a
Fairfax, Va.-based systems software company that develops and
markets PLSORT, a high-performance sort/merge utility for IBM
mainframes and compatible technologies.

This article examines PLSORT from Phase Linear Systems.

It is common knowledge that one of the most frequently run
programs is the sort utility. According to IBM estimates, the
typical MVS production CPU running at full capacity spends
between 15 to 35 percent of its time sorting data and, therefore,
large improvements in sort performance will translate into system
performance gains capable of significantly enhancing CPU capacity
and the longevity of the currently installed hardware. In light
of this, it is surprising that many data centers fail to spend
the little time necessary to analyze and optimize how sorts are
performed in their environment. The fact is that through the
relatively easy and straightforward tuning of common sort
installation and usage parameters, virtually any IBM data center
can achieve significant improvements in system performance that
might otherwise be achieved through more expensive and
time-consuming projects.

As the developers of a widely used sort utility, we can testify
to the effectiveness of informed sort tuning. However, sort
tuning is the classic example of how a little knowledge is
dangerous. Many of the most well-meaning sort "studies" performed
by data centers that are not armed with all the facts end up with
no performance gains and sometimes even witness system
performance degradation. Taken to an extreme level of analysis,
sorting can be a very complex issue. For most data center's
requirements, though, a small amount of background information
with a few rules of thumb will go a long way. The intent of this
article, then, is to shed, what may be for many, a new light on
the various factors that affect how sorts operate and to provide
suggestions for adjusting the sort utility to improve system
performance and throughput. Our particular expertise comes from
working with our product, PLSORT, and we use it to illustrate
certain tuning options that are usually available in any standard
sort package.

Memory: The Crux of the Matter

The most important factor affecting sort performance is virtual
memory; how much is available and how much is actually used by
the sort. Memory usage has direct bearing on the consumption of
other system resources and is thus the focus of most tuning
efforts.

Based on available virtual memory, a sort will either be
performed entirely within main memory or in phases, using
auxiliary storage such as DASD or tapes. Virtual memory
limitations are established by the user through the region and
mainsize parameters and are further defined through parameters
supplied by the sort. In PLSORT, these latter parameters include
maximum and minimum values for the sort to select that are
established once during installation or that can be overridden
for sort jobs with special requirements during execution. The
sophisticated sorts that are commercially available today assess
the amount of available memory from this array of upper and lower
bounds, comparing them to SORTIN file size. Based on the file
size to memory ratio, the sort will form a strategy for memory or
auxiliary storage usage that will have an immense impact on how
fast the sort itself executes and the systems resources it will
expend. Given the large size of many data sets that are routinely
sorted and the number of sorts concurrently executing, the
decisions made by the sort on how much memory to use will also
have profound effects on the rest of the system.

The programmer that is unfamiliar with the logic behind a sort's
memory usage strategy often fails to recognize the potential
benefits that the sort brings to total system performance. For
instance, the uninformed programmer typically focuses on those
CPU resources whose consumption by the sort job are most easy to
measure, namely: CPU time, elapsed time and EXCPs. As the
programmer tunes the sort (often working on an empty system to
make interpretation of elapsed time as simple as possible), the
goal becomes minimizing these statistics.

When the focus is placed in this way on the individual job, the
obvious conclusion is made that the larger the amount of virtual
memory allocated to and used by the sort, the better the
statistics and hence the performance. As a result, many data
centers allocate enormous maximum memory limits (in excess of
10MB is often not out of the question!) and allow their sort to
use this memory aggressively in order that all but the largest
data sets can be sorted entirely within virtual memory (an
"in-core" sort) or in a minimal number of phases. Since no DASD
is required for sort work space in the case of an in-core sort,
few EXCPs are expended by the job. Elapsed time is particularly
healthy since time-consuming access of DASD has been precluded.
CPU time is also minimized without the need to supervise the
handling of creating and writing substrings of the data to
DASD-based sort work that will later need to be sorted and
merged.

The problem, of course, is that this type of gauging of optimal
performance ignores the real crux of what is transpiring on a
busy MVS machine running at full capacity. All active jobs are
competing for real resources, primarily memory. When the
programmer looks at more factors that affect and reflect total
system performance, conservative use of virtual memory turns out
to be much more efficient. With EXCPs, for instance, use of
maximal virtual memory appears to provide optimal I/O performance
since this easy-to-measure resource that can be linked directly
to the sort job is minimized. However, the more virtual memory is
used, the less is available for other jobs on the system, and, as
a result, paging rates begin to soar. Imagine the paging rates
when 10MB virtual storage has been allocated!

The effect of minimizing EXCPs therefore does not result in
proportionally less I/Os. In fact, high paging rates often
outstrip the saved sort job EXCPs, and in terms of I/O activity,
they serve no productive purpose. At least the EXCPs involved in
using DASD for sort work are spent for the constructive purpose
of completing the sort. In contrast, the higher paging rates
resulting from extensive allocation of virtual memory are
executed only to preserve the virtual memory illusion and
contribute nothing to the completion of the sort job at hand.
Obviously, measuring total system I/Os, were it an easily
attainable statistic, would be the key to gauging the correct
level of trade-off between the sort's requirements and those of
other jobs on the system. Total system I/O activity is not easily
measurable.

Another non-obvious effect of allocating large amounts of virtual
memory for the sort is the often overlooked fact that at a
certain threshold of virtual memory usage, the sort phase, is
slower on a per record basis due to an inherent non-linearity in
CPU time with respect to the number of records to be processed.
Also contributing to this effect is the impossibility at a
certain threshold of the sort operating entirely in cache within
main memory.

Obviously, many applications will require aggressive use of
virtual memory and sort memory allocation parameters should be
set appropriately. The moral is simply that as systems engineers
analyze the effects of allocating large amounts of virtual
memory, they should not simply use reduced EXCPs and time
statistics as the only basis for gauging system performance.

Having pointed out what not to do, experience with a wide range
of sort tuning projects has led us to make the following
recommendations: Unless the data center is running at less than
full capacity, or the data sets that are being sorted are very
small, or a larger sort job is literally the only job executing
on the system, use a conservative approach toward use of main
memory.

In the average MVS environments observed, we believe that small
data sets (typically less than 1MB) are most efficiently sorted
in one phase and entirely within virtual memory. Reasonable sort
default memory limits would make a bit less than 2MB of virtual
storage available for this size job. As the data sets to be
sorted get larger, we believe that further attempts at in-core
sorting will generally have undesired effects on the system even
though measurements of EXCPs and CPU time for the sort job will
appear to show efficient processing. PLSORT provides an
installation and run-time adjustable parameter for the data
center to define an upper limit for the amount of virtual memory
used for in-core sorting.

For greatest system efficiency, we recommend a value for this
parameter of between 1MB to 2MB. When the file becomes large
enough (medium-sized at 1MB to 20MB), surpassing this threshold,
the sort is performed using DASD for sort work space. At this
point, the amount of virtual memory beneficial to the sort
performance drops to a quantity approximately one-fifth of the
file size. It is only this amount that the sort will use from the
available pool. As the file size grows, one-fifth of the file
size will exceed the value for maximum in-core sort memory. At
this point, more virtual memory is used until an upper limit of
virtual memory is reached, called SOFTLIM in PLSORT -- which we
tend to set at 4MB. This is established at sort installation and
can be overridden at execution. Above this threshold, a data
center should now be working with large files (greater than
20MB).

While more virtual memory is still accessible between the
user-adjustable soft limits and the absolute limits set by
MAXLIM, it will be most efficient for overall system performance
to stay within the soft limit and tolerate an apparently more
expensive sort in favor of reasonable system paging rates and CPU
overhead. The relationship between sort input file size, virtual
memory and the optimal path of virtual memory usage that is
incorporated into PLSORT is shown in Figure 1. The ranges for
small-, medium- and large-sized files are estimates, obviously
every data center differs and the goal of tuning is for the data
center personnel to establish.

Other Areas to Study: DASD SORTWORK Allocation

Since we recommend reasonable reliance on DASD for sort work
space with medium- and large-sized files, it is worth noting that
a few rules of thumb can greatly improve the efficiency of DASD
usage. For PLSORT, when dynamic SORTWORK allocation is bypassed
and is, instead, allocated by the user, attempt to:

o allocate extents on cylinder boundaries;
o allocate a small number of large extents instead of many small
extents;
o avoid mixing device types and use the faster devices; and
o avoid using VIO for SORTWORK.

MAXSORT

For very large data sets, it is often preferable to use the
MAXSORT option available in some sort packages in place of the
extensive use of DASD for sort work space. MAXSORT under PLSORT
is an option that is put into effect when specified by the user.
It is also put into effect when the sort input file size will
obviously require an amount of DASD in excess of a user-defined
level to run efficiently. MAXSORT is used to limit extensive DASD
consumption in large sorts by allowing tape drives to be used for
sort work space. Often MAXSORT will appear to be a strange
candidate for certain sorts since the use of tape drives results
in much longer elapsed time statistics. However, other benefits
of the option often make it the method of choice for some of the
most critical sorts a data center performs. Central to these is
the restart option that allows the user, in the event of a
failure, to restart the sort at the previously completed step.
For huge data sets, the benefits of not having to re-run
resource-consuming steps or the ability to suspend large sorts
without penalty has major attractions.

Hiperspace Usage

For MVS/ESA users, hiperspace is an often efficient substitute
for DASD. Hiperspace, like main memory and DASD, is a limited
resource, however, and sort jobs are not always the ideal
candidate to be using hiperspace from the perspective of overall
system efficiency. The frequency of turnover is an important
measure for effective use of hiperspace. It is preferable to have
frequent reuse of hiperspace rather than tying up large amounts
for long periods of time with a single job like a large sort.
PLSORT provides several installation and execution time
parameters that give the data center the ability to control the
degree of usage of free hiperspace that is safely available as
reported by the system.

Use the Sort's SVC

Another significant performance issue is addressed through the
special abilities provided via the SVC routine supplied with the
sort. With the SVC, the sort has the ability to translate its own
channel programs and to fix pages in memory once (rather than
once for each I/O). Without the SVC, default operating system
routines perform these functions in a manner not optimized for
sorting.

Resident/Re-entrant Options

Making the sort resident in virtual memory, particularly in XA
above the 16MB line, reduces the time and I/Os consumed in
loading the sort modules. The effect is particularly noticeable
when concurrent sorts are being run. Re-entrance is essential for
sorts that contain a user exit which in turn invokes a sort.

Trade-offs Between EXCPs and Elapsed Time

Often, when there is a single critical sort job that must be
processed within certain time constraints, typically the confines
of the batch window, it is desirable to seek optimal elapsed time
performance even at the expense of some EXCP efficiency. Elapsed
time is minimized by organizing a sort so that overlapping I/Os
can occur. This tactic results in more EXCPs and CPU time at the
expense of lower elapsed time. In PLSORT, a tuning parameter,
IOBAL, is available that allows the individual sort job to
maximize overlapping I/Os. This option is used most profitably
when the cumulative elapsed time for all sorts on the system is
the target. However, in a normal job mix, this overlap occurs
naturally. Therefore, we recommend installing the sort with
defaults to lower EXCPs for the majority of jobs and rely on
overriding parameters (particularly OPT) for critical individual
jobs that require lower elapsed time.

Replace IEBGENER

While not typically used as often as sort, IEBGENER is a
frequently run program. Most sorts supply a transparent IEBGENER
replacement that performs more efficiently. In the case of
PLSORT, replacing IEBGENER with PLSGENER results in up to 90
percent reduction of CPU time and EXCPs.

Other issues that can affect sort performance and other tuning
options are certainly worthy of consideration. The subjects
examined here are those that data centers can focus on and
achieve improvements from while expending relatively small
amounts of time and energy. For the most part, the potential for
improvement lies not in the sort product itself, but rather
through intelligent systems resource allocations and using the
sort to maximize total systems performance, not simply the
easy-to-measure statistics of the individual and isolated sort
job.

/*

Was this article of value to you?  If so, please let us know by
circling Reader Service No. 00. For more information on this
product, please circle Reader Service No. XX.
