TUNING THE SORT UTILITY TO ACHIEVE SIGNIFICANT GAINS IN TOTAL SYSTEM PERFORMANCE By Patrick Dowling The author is marketing manager for Phase Linear Systems, a Fairfax, Va.-based systems software company that develops and markets PLSORT, a high-performance sort/merge utility for IBM mainframes and compatible technologies. This article examines PLSORT from Phase Linear Systems. It is common knowledge that one of the most frequently run programs is the sort utility. According to IBM estimates, the typical MVS production CPU running at full capacity spends between 15 to 35 percent of its time sorting data and, therefore, large improvements in sort performance will translate into system performance gains capable of significantly enhancing CPU capacity and the longevity of the currently installed hardware. In light of this, it is surprising that many data centers fail to spend the little time necessary to analyze and optimize how sorts are performed in their environment. The fact is that through the relatively easy and straightforward tuning of common sort installation and usage parameters, virtually any IBM data center can achieve significant improvements in system performance that might otherwise be achieved through more expensive and time-consuming projects. As the developers of a widely used sort utility, we can testify to the effectiveness of informed sort tuning. However, sort tuning is the classic example of how a little knowledge is dangerous. Many of the most well-meaning sort "studies" performed by data centers that are not armed with all the facts end up with no performance gains and sometimes even witness system performance degradation. Taken to an extreme level of analysis, sorting can be a very complex issue. For most data center's requirements, though, a small amount of background information with a few rules of thumb will go a long way. The intent of this article, then, is to shed, what may be for many, a new light on the various factors that affect how sorts operate and to provide suggestions for adjusting the sort utility to improve system performance and throughput. Our particular expertise comes from working with our product, PLSORT, and we use it to illustrate certain tuning options that are usually available in any standard sort package. Memory: The Crux of the Matter The most important factor affecting sort performance is virtual memory; how much is available and how much is actually used by the sort. Memory usage has direct bearing on the consumption of other system resources and is thus the focus of most tuning efforts. Based on available virtual memory, a sort will either be performed entirely within main memory or in phases, using auxiliary storage such as DASD or tapes. Virtual memory limitations are established by the user through the region and mainsize parameters and are further defined through parameters supplied by the sort. In PLSORT, these latter parameters include maximum and minimum values for the sort to select that are established once during installation or that can be overridden for sort jobs with special requirements during execution. The sophisticated sorts that are commercially available today assess the amount of available memory from this array of upper and lower bounds, comparing them to SORTIN file size. Based on the file size to memory ratio, the sort will form a strategy for memory or auxiliary storage usage that will have an immense impact on how fast the sort itself executes and the systems resources it will expend. Given the large size of many data sets that are routinely sorted and the number of sorts concurrently executing, the decisions made by the sort on how much memory to use will also have profound effects on the rest of the system. The programmer that is unfamiliar with the logic behind a sort's memory usage strategy often fails to recognize the potential benefits that the sort brings to total system performance. For instance, the uninformed programmer typically focuses on those CPU resources whose consumption by the sort job are most easy to measure, namely: CPU time, elapsed time and EXCPs. As the programmer tunes the sort (often working on an empty system to make interpretation of elapsed time as simple as possible), the goal becomes minimizing these statistics. When the focus is placed in this way on the individual job, the obvious conclusion is made that the larger the amount of virtual memory allocated to and used by the sort, the better the statistics and hence the performance. As a result, many data centers allocate enormous maximum memory limits (in excess of 10MB is often not out of the question!) and allow their sort to use this memory aggressively in order that all but the largest data sets can be sorted entirely within virtual memory (an "in-core" sort) or in a minimal number of phases. Since no DASD is required for sort work space in the case of an in-core sort, few EXCPs are expended by the job. Elapsed time is particularly healthy since time-consuming access of DASD has been precluded. CPU time is also minimized without the need to supervise the handling of creating and writing substrings of the data to DASD-based sort work that will later need to be sorted and merged. The problem, of course, is that this type of gauging of optimal performance ignores the real crux of what is transpiring on a busy MVS machine running at full capacity. All active jobs are competing for real resources, primarily memory. When the programmer looks at more factors that affect and reflect total system performance, conservative use of virtual memory turns out to be much more efficient. With EXCPs, for instance, use of maximal virtual memory appears to provide optimal I/O performance since this easy-to-measure resource that can be linked directly to the sort job is minimized. However, the more virtual memory is used, the less is available for other jobs on the system, and, as a result, paging rates begin to soar. Imagine the paging rates when 10MB virtual storage has been allocated! The effect of minimizing EXCPs therefore does not result in proportionally less I/Os. In fact, high paging rates often outstrip the saved sort job EXCPs, and in terms of I/O activity, they serve no productive purpose. At least the EXCPs involved in using DASD for sort work are spent for the constructive purpose of completing the sort. In contrast, the higher paging rates resulting from extensive allocation of virtual memory are executed only to preserve the virtual memory illusion and contribute nothing to the completion of the sort job at hand. Obviously, measuring total system I/Os, were it an easily attainable statistic, would be the key to gauging the correct level of trade-off between the sort's requirements and those of other jobs on the system. Total system I/O activity is not easily measurable. Another non-obvious effect of allocating large amounts of virtual memory for the sort is the often overlooked fact that at a certain threshold of virtual memory usage, the sort phase, is slower on a per record basis due to an inherent non-linearity in CPU time with respect to the number of records to be processed. Also contributing to this effect is the impossibility at a certain threshold of the sort operating entirely in cache within main memory. Obviously, many applications will require aggressive use of virtual memory and sort memory allocation parameters should be set appropriately. The moral is simply that as systems engineers analyze the effects of allocating large amounts of virtual memory, they should not simply use reduced EXCPs and time statistics as the only basis for gauging system performance. Having pointed out what not to do, experience with a wide range of sort tuning projects has led us to make the following recommendations: Unless the data center is running at less than full capacity, or the data sets that are being sorted are very small, or a larger sort job is literally the only job executing on the system, use a conservative approach toward use of main memory. In the average MVS environments observed, we believe that small data sets (typically less than 1MB) are most efficiently sorted in one phase and entirely within virtual memory. Reasonable sort default memory limits would make a bit less than 2MB of virtual storage available for this size job. As the data sets to be sorted get larger, we believe that further attempts at in-core sorting will generally have undesired effects on the system even though measurements of EXCPs and CPU time for the sort job will appear to show efficient processing. PLSORT provides an installation and run-time adjustable parameter for the data center to define an upper limit for the amount of virtual memory used for in-core sorting. For greatest system efficiency, we recommend a value for this parameter of between 1MB to 2MB. When the file becomes large enough (medium-sized at 1MB to 20MB), surpassing this threshold, the sort is performed using DASD for sort work space. At this point, the amount of virtual memory beneficial to the sort performance drops to a quantity approximately one-fifth of the file size. It is only this amount that the sort will use from the available pool. As the file size grows, one-fifth of the file size will exceed the value for maximum in-core sort memory. At this point, more virtual memory is used until an upper limit of virtual memory is reached, called SOFTLIM in PLSORT -- which we tend to set at 4MB. This is established at sort installation and can be overridden at execution. Above this threshold, a data center should now be working with large files (greater than 20MB). While more virtual memory is still accessible between the user-adjustable soft limits and the absolute limits set by MAXLIM, it will be most efficient for overall system performance to stay within the soft limit and tolerate an apparently more expensive sort in favor of reasonable system paging rates and CPU overhead. The relationship between sort input file size, virtual memory and the optimal path of virtual memory usage that is incorporated into PLSORT is shown in Figure 1. The ranges for small-, medium- and large-sized files are estimates, obviously every data center differs and the goal of tuning is for the data center personnel to establish. Other Areas to Study: DASD SORTWORK Allocation Since we recommend reasonable reliance on DASD for sort work space with medium- and large-sized files, it is worth noting that a few rules of thumb can greatly improve the efficiency of DASD usage. For PLSORT, when dynamic SORTWORK allocation is bypassed and is, instead, allocated by the user, attempt to: o allocate extents on cylinder boundaries; o allocate a small number of large extents instead of many small extents; o avoid mixing device types and use the faster devices; and o avoid using VIO for SORTWORK. MAXSORT For very large data sets, it is often preferable to use the MAXSORT option available in some sort packages in place of the extensive use of DASD for sort work space. MAXSORT under PLSORT is an option that is put into effect when specified by the user. It is also put into effect when the sort input file size will obviously require an amount of DASD in excess of a user-defined level to run efficiently. MAXSORT is used to limit extensive DASD consumption in large sorts by allowing tape drives to be used for sort work space. Often MAXSORT will appear to be a strange candidate for certain sorts since the use of tape drives results in much longer elapsed time statistics. However, other benefits of the option often make it the method of choice for some of the most critical sorts a data center performs. Central to these is the restart option that allows the user, in the event of a failure, to restart the sort at the previously completed step. For huge data sets, the benefits of not having to re-run resource-consuming steps or the ability to suspend large sorts without penalty has major attractions. Hiperspace Usage For MVS/ESA users, hiperspace is an often efficient substitute for DASD. Hiperspace, like main memory and DASD, is a limited resource, however, and sort jobs are not always the ideal candidate to be using hiperspace from the perspective of overall system efficiency. The frequency of turnover is an important measure for effective use of hiperspace. It is preferable to have frequent reuse of hiperspace rather than tying up large amounts for long periods of time with a single job like a large sort. PLSORT provides several installation and execution time parameters that give the data center the ability to control the degree of usage of free hiperspace that is safely available as reported by the system. Use the Sort's SVC Another significant performance issue is addressed through the special abilities provided via the SVC routine supplied with the sort. With the SVC, the sort has the ability to translate its own channel programs and to fix pages in memory once (rather than once for each I/O). Without the SVC, default operating system routines perform these functions in a manner not optimized for sorting. Resident/Re-entrant Options Making the sort resident in virtual memory, particularly in XA above the 16MB line, reduces the time and I/Os consumed in loading the sort modules. The effect is particularly noticeable when concurrent sorts are being run. Re-entrance is essential for sorts that contain a user exit which in turn invokes a sort. Trade-offs Between EXCPs and Elapsed Time Often, when there is a single critical sort job that must be processed within certain time constraints, typically the confines of the batch window, it is desirable to seek optimal elapsed time performance even at the expense of some EXCP efficiency. Elapsed time is minimized by organizing a sort so that overlapping I/Os can occur. This tactic results in more EXCPs and CPU time at the expense of lower elapsed time. In PLSORT, a tuning parameter, IOBAL, is available that allows the individual sort job to maximize overlapping I/Os. This option is used most profitably when the cumulative elapsed time for all sorts on the system is the target. However, in a normal job mix, this overlap occurs naturally. Therefore, we recommend installing the sort with defaults to lower EXCPs for the majority of jobs and rely on overriding parameters (particularly OPT) for critical individual jobs that require lower elapsed time. Replace IEBGENER While not typically used as often as sort, IEBGENER is a frequently run program. Most sorts supply a transparent IEBGENER replacement that performs more efficiently. In the case of PLSORT, replacing IEBGENER with PLSGENER results in up to 90 percent reduction of CPU time and EXCPs. Other issues that can affect sort performance and other tuning options are certainly worthy of consideration. The subjects examined here are those that data centers can focus on and achieve improvements from while expending relatively small amounts of time and energy. For the most part, the potential for improvement lies not in the sort product itself, but rather through intelligent systems resource allocations and using the sort to maximize total systems performance, not simply the easy-to-measure statistics of the individual and isolated sort job. /* Was this article of value to you? If so, please let us know by circling Reader Service No. 00. For more information on this product, please circle Reader Service No. XX.