"IMPROVED PERFORMANCE AND CONTROL FOR VM USERS" Resource Control Tools Can Reduce The Need For Hardware Upgrades And Improve System Response Times. By David Gibson David Gibson is the Vice President of Technical Operations for MACRO 4, Inc., of Mt. Freedom, New Jersey. Mr. Gibson has over 16 years experience in data processing. VM accounting packages were originally developed to provide a means for charging back system resources to divisions within a company, or to customers in a service bureau. Some products also offer a monitoring facility that provides the system's administrator with information required for tuning the system, thereby increasing throughput or improving TP responses. But the need to control users in VM environments, though critically important, is often overlooked. This can be a severe oversight since with effective resource control tools system usage can be increased without the need for increased hardware expenditure. These tools can also stabilize, and in many cases, reduce response times for all business-critical applications. An adequate degree of control is available in a native DOS/VSE system, using CICS. However, in the larger VM environment, control is not as straightforward because a large number of CMS machines, as well as CICS systems running in other virtual DOS machines, are all competing for the same finite resources. VM's inherent control facilities, such as PRIORITY, QDROP, and FAVOR, can help in managing resource allocation, but these facilities are not dynamic. Furthermore, without a VM resource control tool, even with PRIORITY specified, it is still possible for a low priority virtual machine to monopolize limited resources. This situation may occur, for example, if a machine in a CPU loop has a very small working set which, as far as the scheduler is concerned, counteracts the low priority restriction. If a CMS machine does go into a loop while the operator is not attending his machine, the resource load can become unbalanced leaving the rest of the user base with an increase in response times. The likelihood of such an occurrence causing unpredictable and erratic response times increases with the number of virtual machines in any given installation. In an actual installation, at a large money center bank running on an Amdahl host with 2,000 interactive users, the best the overloaded system could provide was adequate response times. But at irregular, yet frequent, intervals throughout the day certain users would kick off number crunching batch jobs that slowed response time for the rest of the users. The bank attempted to control the problem with the PRIORITY parameter, but it had no effect on the culprit machines causing the system overload. In the past the bank had purchased increasingly large CPU's, but they were now at the point where no further upgrades were possible without an investment of hundreds of thousands of dollars. As a cost-effective, and more efficient alternative, they opted for a VM performance and control system called VPAC from MACRO 4, costing them less than $4,000 per year. VPAC is different from other VM performance monitoring packages in its ability to provide automated resource control. It actually interfaces with the VM's dispatching control program to slow down specified, number-crunching, development machines, thereby preserving resources for other business-critical interactive system users. To the bank, this meant they were able to maintain their current hardware configuration, and install more users. Poor response time is generally related to an overload on some key resource in the system--the CPU, storage management subsystem, I/O subsystem, or page management system. VPAC enables users to see how each of these resources is utilized during periods (of optimal-- and worst-case--response times). Since there are no yardstick figures applicable to all data centers, these guideline figures must be developed for each individual installation, based on what is acceptable to their users. VPAC continuously collects data to develop that archive so threshold limits can then be set and restricts specific users when resources are becoming overloaded in any given area. During the initial data acquisition phase, when users call to report poor response times, the VPAC system administrator can examine usage of each of the four resources and determine which is peaking and which user(s) are causing the problem. The information is presented graphically online for any time interval, and is available for printout in summary report form. If deemed necessary, various resources can be compared graphically to gain a better understanding of what is slowing access time. This is accomplished by overlaying graphs to determine, for example, any relationships between I/O load and paging, or I/O and CPU usage. In VPAC, any resource graph can be correlated with another to establish average and peak values for each. Once the norms for each resource at the installation are established, the thresholds that precipitate response time breakdowns can be determined. When the thresholds are exceeded, exception messages appear on the system administrator's console and are logged in an audit trail file. At this time, the system administrator can identify the user causing the problem and take appropriate action to restrict that virtual machine's usage before response times on the remaining users degrades. Alternatively, VPAC can be set up to automatically determine which users have exceeded the thresholds specified, and take pre-defined restrictive action--ranging from a simple message being routed to the virtual machines in question and/or the system administrator/operator, to limiting resource usage, or actually forcing the resource utilization hog off the system. In addition to specifying limits for individuals or groups of virtual machines, the VPAC administrator can set trigger levels for the total system usage. If the total consumption of a resource is below the trigger level, no action is taken; however, if consumption of that resource rises above the trigger level, VPAC's resource control is implemented. The end result is that VPAC will allow controlled virtual machines to use as much resource as they require so long as the total, system-wide consumption remains below the threshold, or trigger level. Thresholds can be specified as limits (three minutes of CPU time, or 2,000 I/Os), or as rates (i.e.,15 percent CPU utilization, or 20 I/Os per second). Different thresholds can be set for different users, different groups of users, or the same user for different times of the day or days of the week. Alternatively, certain virtual machines, such as production VSE machines, can be excluded from exception reporting/resource control on the basis that they may be allowed to monopolize the machine when necessary. This type of resource control provides a range of benefits. For one, if the installation upgrades its processor, the impact on CPU consumption can be quickly and quantitatively determined. Similarly, if disks are changed, the real improvement on I/O throughput can be specified. If memory is added, the affect on paging can be identified. Perhaps more importantly from a cost benefit and system operational efficiency standpoint, if paging is excessive, the user responsible for the resource consumption can be identified and control mechanisms instituted rather than adding memory. VPAC will then let system administrators tailor memory allocation based on overall system usage, preventing any one user from hogging resources. Without exception monitor and resource control tools, it may not have been possible to identify resource offenders and take appropriate action. With a tool like VPAC, identifying programmers who take the "hit and miss" approach to debugging, resubmitting jobs and running them several times, rather than diagnosing problems before submitting a job, can be easily identified and dealt with. Another benefit of resource control is the elimination of the peaks and troughs in resource consumption. For example, a user doing an interactive compile may now run for four minutes using only moderate amounts of resources and without any noticeable impact on the TP system, where without VPAC the job may have run two minutes. The shorter route may have been more acceptable to the programmer but it grabs large amounts of CPU and slows overall system response times for the rest of the users. Not to be overlooked are the security implications of resource control. In VPAC a number of thresholds and actions can be grouped together to form resource control classes that are assigned start and end times, and an optional day of the week. If, for example, resource control classes are defined for weekends and holidays with a CPU limit of zero, virtual machines assigned these classes by VPAC during these times will be locked out. VPAC, as a comprehensive monitoring and accounting tool, also provides charge-back capabilities, allowing users to tailor reports as needed. System usage can be charged, for example, on the basis of CPU usage or number of I/Os--with any dollar amount assigned to each. When considering the purchase of an accounting and monitoring tool for VM all these features should be present. In addition, consider the ease of installation. Some packages require extensive modifications to the operating system each time the package is installed, de-installed, or the operating system updated. VPAC, by comparison, requires no such machinations, installing in about five minutes without modifications to the operating system. An accounting package with resource control provides invaluable management control over system resource usage, enabling system administrators, for the first time, to pinpoint resource hogs and deal with them appropriately. The packages also quantitatively determine if hardware upgrades are really necessary, and allow for system usage charge-back. With these cost benefits, DP shops will find the packages pay for themselves each and every month. But of equal importance, dealing with degrading system response times will no longer be a problem. That spells end-user satisfaction, a priceless goal. /* 1617