RMF Part XII: What's New in SP 4? By Cheryl Watson The author is president of Watson & Walker, Inc., an MVS training and consulting firm and publisher of Cheryl Watson's Tuning Letter, a monthly MVS performance newsletter. Cheryl has over 24 years experience in IBM mainframe software. Questions about this or previous articles can be addressed to Cheryl Watson at (800)553-4562 or (813)949-3673 in Tampa, Fla. RMF Version 4 Release 2.1 has added many new features supporting MVS/ESA SP 4.2. The new data can be used for reporting, capacity planning and tuning. This is the last article in this RMF series. It examines these new changes and provides suggestions on how to use the new data. Overview of Changes One change in RMF is not really related to SP 4, but is a change to the PR/SMpartition data report available with a PTF. Since the partition data report was described in Part III of this series (March 1990), I've included this update for users of PR/SM. The majority of the RMF changes in SP 4 support Advanced Program- to-Program Communication/MVS (APPC/MVS), the Cross-system Coupling Facility (XCF) and the new changes in SRM. See the May 1991 issue of Technical Support for my article, "SRM Slims Down," which addresses the SRM changes. APPC/MVS is the advanced program-to-program communication support for cooperative processing. This allows a program on the host to communicate with programs on a PC. These programs are called TPs, or transaction programs. Communication between the host and a PC is called a conversation. The host will have two address spaces, APPC/MVS and ASCH. ASCH is the transaction scheduler that schedules work to a TP. Each TP runs as its own address space. RMF treats ASCH and its TPs as a separate subsystem. Therefore, whenever there's a reference to batch, STC and TSO, you'll also see a reference to ASCH. The XCF data reports on the activity between members of a sysplex. It reports on messages sent and received between remote locations, both systemwide and by remote location and path. What's unique about the XCF data is that while it's provided as part of RMF postprocessor reporting, the data can only be collected in RMF Monitor III. All of the sites who have been ignoring Monitor III will now have to take a closer look. Not a bad idea in general! While not shown in this article, RMF provides three separate XCF reports. One report provides information on the amount of messages inbound and outbound from the system. A second report shows the activity by application. Thus, you can determine the activity from GRS or MCS separately. A third report shows activity and contention on the paths between systems. The SRM SP 4.2 changes resulted in modifications to RMF to remove reporting of workload levels (which don't exist any more). New reporting in RMF was added to support block paging activity. Additionally, the post processor has been changed to provide much more information that's always been in the records, just not reported. Additional goodies, such as the addition of the device model identifier in the Device Activity reports, make this a very desirable version of RMF to obtain. So, let's examine the new data and how you might use it. Partition Data Report Here's some good news! IBM announced a new APAR for PR/SM that reports on some of the PR/SM overhead. The APAR number is OY36668 and will be available this quarter. The APAR will be available for the following configurations: RMF 3.5.0 and later (for MVS/SP 2.2 and later), ES/9000 9021, ES/9000 9121, ES/3090-9000T, ES/3090 J (180, 200, 280 and above), SEC 227574 and SEC 227576. The APAR allows RMF to provide separate measurements for the amount of PR/SM overhead that can be identified. As you'll see, the measurable overhead is fairly low. The additional multiprocessing effects represent the more important (and still unmeasured) overhead. Essentially, PR/SM will keep separate CPU measurements for time to manage each LPAR and a CPU measurement for general PR/SM time that can't be assigned to an LPAR. This general PR/SM management time is assigned to a dummy LPAR called PHYSICAL. To understand these measurements, see the new RMF data in Figure 1. First look at the LPAR called PHYSICAL. This row shows the total amount of CPU time where PR/SM was executing and couldn't attribute the time to a specific LPAR. The far right column indicates the percent of the entire machine used for this management. In Figure 1, physical PR/SM management time used is 37 minutes, 4.22 seconds or 1.28 percent of the six CPUs during this eight- hour interval. Let's look at the data on this report. You'll find additional columns here. A new column has been added to indicate whether capping was in effect for this LPAR. This replaced the column that had been used for WAIT COMPLETION, which is now listed at the top of the page. The two columns titled DISPATCH TIME DATA display the EFFECTIVE and TOTAL CPU time in hh.mm.ss.ttt format. The TOTAL column is the sum of the effective (non-PR/SM) time and PR/SM management time for the LPAR. For LPAR PROD, there was 42.615 (13.05.37.659 minus 13.04.55.044) seconds of CPU due to PR/SM. The sum for the EFFECTIVE column represents the CPU time consumed for all LPARs. The sum for theTOTAL column represents the CPU time consumed for all LPARs and PR/SM management time. The two columns titled LOGICAL PROCESSORS show the average percent of logical CPU busy based on the number of logical CPUs assigned for EFFECTIVE time and the TOTAL time. The calculation is simply: EFFECTIVE time in seconds divided by the NUMBER OF LOG PRCRS (logical processors) divided by the length of the interval in seconds (this all multiplied by 100 to obtain a percent). Looking at PROD, the EFFECTIVE time in seconds is 47137.659 and the interval in seconds is 28,800. The average CPU busy would then be: ((47137.659 /4) /28800) * 100 = 40.9181 percent. The EFFECTIVE percent, not the TOTAL percent, will be shown on the type 70 CPU Activity data (the RMF CPU Activity Report) for non-dedicated LPARs. This EFFECTIVE time is also the one used in RMF Monitor II data and RMF Monitor III displays. (If you're running another vendor's monitor, you'll need to check with the vendor to see which data is displayed.) The rightmost columns titled PHYSICAL PROCESSORS show the percent of CPU time as derived from the total CPU time for all CPUs and is a picture of the entire machine. In this example, these six physical CPUs represent a total of 48 hours available for this eight-hour period. This available time is divided into the EFFECTIVE time to get EFFECTIVE percent; divided into the TOTAL time to get TOTAL percent; and divided into the difference between TOTAL and EFFECTIVE time to get the LPAR MGMT time. Looking at these columns, you can see that the entire machine was 85.30 percent busy and 3.17 percent of that was directly attributable to PR/SM. RMF still shows close to 100 percent for dedicated LPARs, so you'll still have to look at the CPU Activity data to find the "true" CPU busy for that LPAR. What I don't understand about some of the data that I've seen is that it appears that there is the same amount of PR/SM management time for dedicated LPARs as there is for shared LPARs. This is contrary to what you might assume based on IBM documentation and user's benchmarks. So how do you use this new data? I would use the PHYSICAL PROCESSORS EFFECTIVE percent to obtain each LPAR's use and the PHYSICAL PROCESSORS LPAR MGMT (TOTAL) to obtain the percent of the machine for PR/SM, graphing it as its own workload. Thus, this machine would appear to have three workloads (PROD taking 27.27 percent of the machine, DEVL with 54.85 percent and PR/SM management with 3.17 percent). The PR/SM management time would include the PHYSICAL time and the overhead for each LPAR. You can use the DISPATCH TIME DATA EFFECTIVE time to obtain the actual number of seconds used by each LPAR if you prefer to report in seconds. The LPAR MGMT time for each LPAR can be used to determine the effect of capping and/or changes in volume in PR/SM overhead. The low utilization effect (LUE) shows up easily with this reporting, since you'll see more PR/SM management time when the CPUs are lightly loaded. Is this all of PR/SM overhead? I'm afraid not. This data now shows the overhead that is due to direct PR/SM management and includes overhead due to LUE and capping. It does not include overhead due to the multiprocessing effect, such as cache contention or TLB/ALB flush. I love the data that this new APAR provides, but I'd hate for anyone to assume that it's reporting all of the overhead due to the processing of LPARs. Device Activity Here's some other great news! The device type, model and control unit model are now available in the RMF records. Only the device model is reported, but the control unit model is located in the record and can be obtained. Figure 2 shows a sample RMF Direct Access Device Activity Report with the new field listed under the device type. The data will be much more usable in analyzing device response data and data set placement. Without cache, you want to ensure that your faster devices such as the 3390s are providing better responses than the slower devices. You also want to make sure that the highest activity is generally located on the faster devices. You'll also find the device type and model on the Page/Swap Data Set Activity Report (not shown). You can also see in Figure 2 the addition of another column. The column AVG DPB DLY represents the number of milliseconds of delay due to ESCON director port busy. The PEND time is now composed of four elements: AVG CUB DLY (control unit busy), AVG DB DLY (device reserved or head of string busy delay), AVG DPB DLY (ESCON director port delay) and channel busy delay. If you have programs that calculate the channel busy delay, you'll need to add this new field in. Additionally, you can use this column to determine if delays are being caused by the ESCON directors. It will help you determine if more directors are needed. A similar measurement, the percent DP BUSY (percent of time the director path was busy), is also provided in the I/O Queuing Activity Report that's associated with the logical control unit activity (not shown). The Channel Path Activity Report has also added ESCON information. The channel TYPE field now contains additional values. The types can now be: o BY - byte multiplexor; o BL - block multiplexor; o CV - ESCON converter attached to channel; o CN - ESCON channel; and o CND - ESCON channel with ESCON director. This information can be used to verify the configuration of ESCON channels that you've assumed and can be used to determine the channel busy for each channel. You can use this to see if your work is fairly balanced, and you can track the amount of usage on each of the channels. APPC/MVS The APPC/MVS data shows up in two places in RMF. The CPU Activity Report provides a breakdown of scheduled TPs similar to batch, STC and TSO. Figure 3 shows an extract from the CPU Activity data. The Workload Activity Report will also show a reference to subsystem ASCH on the performance group periods. Of primary interest in the workload report is the addition of the transaction queue time. In Figure 4, you'll see the new Workload Activity Report. The rightmost column contains the TRX (transaction response time), the SD (standard deviation), the new QUE (input wait queue) time and the TOTal time (sum of TRX and QUE). The input queue time represents the amount of time that a transaction was queued to be scheduled to TP. You can use this to set service level objectives for TP conversations and to determine if excessdelays are occurring. Workload Activity Enhancements The Workload Activity Report in Figure 4 shows the enhancements to the type 72 data as well as improvements on the report. You'll see several changes here. The outdated workload level and service objective have been removed. You can now retire your calculator (at least for a couple of calculations). The post processor has added the calculated TCB and SRB seconds (see TCB and SRB)! Thank you, thank you, thank you, IBM! They've also added the percent of a single CPU used by each performance group period, performance group and domain. For example, a job that is using two full CPUs would show as 200 percent. Now you can see at a glance how much of the machine your CICS region is taking. Block Paging SRM in SP 4.2 added the capability of performing block paging. This is a technique of grouping contiguous virtual storage pages that are accessed at the same time as a logical group. When one is paged out, they all are; when one is paged in, they all are. This can be optionally chosen by SRM when managing certain address spaces or can be explicitly requested by the program. Block paging is designed to prevent overhead due to programs with very large working set sizes. It should result in less paging, less CPU time and shorter response times. On the other hand, SRM only instigates block paging when the system is storage-constrained. Therefore, review of the amount of block paging can tell when the system is getting overloaded and which performance groups have address spaces that are managed with this technique. The RMF Workload Activity Report in Figure 4 shows block paging activity by performance group period, performance group, domain and total system. The PAGE-IN RATES are pages per second. SINGLE refers to normal page-ins (non-blocked). BLOCK refers to the number of pages paged-in as part of a block. You can also see block paging from expanded storage in the EXP BLK field. EXP SNGL refers to page movements from expanded to central. Block paging is also seen in the Paging Activity Report in Figure 5. A new column has been added for page-ins using block movement. The column, NON-SWAP, BLOCK, shows the page-ins per second from auxiliary to central using block paging. An addition to this report can be seen in the lower left corner. RMF has added the following: AVERAGE NUMBER OF PAGES PER BLOCK, BLOCKS PER SECOND and PAGE_IN EVENTS (PAGE FAULT RATE). The page fault rate can be used to determine the appropriate SRM value for MPL control, RCCPTRT, in the IEAOPTxx member. CSTOR Capping The RMF Monitor II ASD (Address Space State Data) provides another SRM-related field. Figure 6 shows a sample ASD display. The column called CS TAR (Central Storage Target) contains a zero if the address space is being monitored by SRM due to its high storage activity or usage, a blank if it's not being monitored, and a target working set size for central storage if it's being managed by SRM. Address spaces will typically only be monitored or managed if they have very large storage requirements on a storage or CPU-constrained system. New Workload Data Additional Workload Activity data can be useful in determining expanded storage usage. As mentioned before, the report now contains page movements from expanded to central. Previously, the STORAGE field contained the total of central and expanded frames. In SP 4, these are now broken out with the number of CENTRAL, EXPANDed and TOTAL frames. The AVERAGE is the TOTAL divided by the AVG TRANSACTIONS. If there is an average of one or more address spaces during the period, the AVERAGE will be less than the total. If there is less than one address space, the AVERAGE will be less than the TOTAL. RMF has also added the ended transaction rate as transactions per second in the END/SEC field. As mentioned in the APPC discussion, there is a new field called QUE time. For JES, this QUE time is the time spent in the JES input queue waiting for an initiator. If you add this field to the TRX field (average response), you can obtain the turnaround time for batch jobs. This is reported as the TOT response time. Now, if you have a job class assigned to a specific performance group, you can determine service level objectives from the RMF data. This is certainly easier than processing the SMF data. There are some occurrences, however, that will invalidate this. For example, if the operator resets a performance group, the transaction count for the job will be reset and the response time will appear to be better than actually experienced. Also, if the job changes performance groups by coding the PERFORM parameter on the step card, the transaction starts over and will appear as a shorter transaction than the entire job. Summary As can be seen from these changes, RMF has added some very desirable features. If you'll be converting to this release soon, you can start on a few items now. Consider how to report the PR/SM management time in your PR/SM environment. Identify your current performance groups and determine if you need more or less to take advantage of the new workload data such as expanded storage usage, hiperspace activity and block paging. Determine if any of your DASD reporting can benefit from knowledge of the device model and control unit mode. Determine whether you'll be implementing APPC/MVS soon to start considering APPC as a new workload and new subsystem. /* Was this article of value to you? If so, please let us know by circling Reader Service No. 00.