RMF Part XII: What's New in SP 4?
By Cheryl Watson

The author is president of Watson & Walker, Inc., an MVS training
and consulting firm and publisher of Cheryl Watson's Tuning
Letter, a monthly MVS performance newsletter. Cheryl has over 
24 years experience in IBM mainframe software. Questions about 
this or previous articles can be addressed to Cheryl Watson 
at (800)553-4562 or (813)949-3673 in Tampa, Fla.

RMF Version 4 Release 2.1 has added many new features supporting 
MVS/ESA SP 4.2. The new data can be used for reporting, capacity 
planning and tuning. This is the last article in this RMF series. 
It examines these new changes and provides suggestions on how to 
use the new data.

Overview of Changes

One change in RMF is not really related to SP 4, but is a change
to the PR/SMpartition data report available with a PTF. Since the 
partition data report was described in Part III of this series 
(March 1990), I've included this update for users of PR/SM. The 
majority of the RMF changes in SP 4 support Advanced Program-
to-Program Communication/MVS (APPC/MVS), the Cross-system
Coupling Facility (XCF) and the new changes in SRM. See the May
1991 issue of Technical Support for my article, "SRM Slims Down," 
which addresses the SRM changes.

APPC/MVS is the advanced program-to-program communication support
for cooperative processing. This allows a program on the host to
communicate with programs on a PC. These programs are called TPs, 
or transaction programs. Communication between the host and a PC 
is called a conversation. The host will have two address spaces, 
APPC/MVS and ASCH. ASCH is the transaction scheduler that schedules 
work to a TP. Each TP runs as its own address space. RMF treats 
ASCH and its TPs as a separate subsystem. Therefore, whenever  
there's a reference to batch, STC and TSO, you'll also see a 
reference to ASCH.

 The XCF data reports on the activity between members of a
sysplex. It reports on messages sent and received between remote 
locations, both systemwide and by remote location and path. What's 
unique about the XCF data is that while it's provided as part 
of RMF postprocessor reporting, the data can only be collected 
in RMF Monitor III. All of the sites who have been ignoring Monitor
III will now have to take a closer look. Not a bad idea in general!
While not shown in this article, RMF provides three separate XCF
reports. One report provides information on the amount of messages 
inbound and outbound from the system. A second report shows the 
activity by application. Thus, you can determine the activity from 
GRS or MCS separately. A third report shows activity and contention 
on the paths between systems.

The SRM SP 4.2 changes resulted in modifications to RMF to remove
reporting of workload levels (which don't exist any more). New 
reporting in RMF was added to support block paging activity. 
Additionally, the post processor has been changed to provide 
much more information that's always been in the records, just 
not reported.

Additional goodies, such as the addition of the device model
identifier in the Device Activity reports, make this a very 
desirable version of RMF to obtain. So, let's examine the new 
data and how you might use it.

Partition Data Report

Here's some good news! IBM announced a new APAR for PR/SM that
reports on some of the PR/SM overhead. The APAR number is OY36668 
and will be available this quarter.

The APAR will be available for the following configurations: RMF
3.5.0 and later (for MVS/SP 2.2 and later), ES/9000 9021, 
ES/9000 9121, ES/3090-9000T, ES/3090 J (180, 200, 280 and above), 
SEC 227574 and SEC 227576. The APAR allows RMF to provide separate 
measurements for the amount of PR/SM overhead that can be identified. 
As you'll see, the measurable overhead is fairly low. The additional
multiprocessing effects represent the more important (and still unmeasured)
overhead. Essentially, PR/SM will keep separate CPU measurements
for time to manage each LPAR and a CPU measurement for general
PR/SM time that can't be assigned to an LPAR. This general PR/SM 
management time is assigned to a dummy LPAR called PHYSICAL.

To understand these measurements, see the new RMF data in Figure

1. First look
at the LPAR called PHYSICAL. This row shows the total amount of
CPU time where PR/SM was executing and couldn't attribute the 
time to a specific LPAR. The far right column indicates the 
percent of the entire machine used for this management. 
In Figure 1, physical PR/SM management time used is 37 minutes, 
4.22 seconds or 1.28 percent of the six CPUs during this eight-
hour interval. Let's look at the data on this report. You'll 
find additional columns here. A new column has been added to 
indicate whether capping was in effect for this LPAR. This replaced 
the column that had been used for WAIT COMPLETION, which
is now listed at the top of the page. The two columns titled
DISPATCH TIME DATA display the EFFECTIVE and  TOTAL CPU time in 
hh.mm.ss.ttt format. 

The TOTAL column is the sum of the effective (non-PR/SM) time and
PR/SM management time for the LPAR. For LPAR PROD, there was 
42.615 (13.05.37.659 minus 13.04.55.044) seconds of CPU due to 
PR/SM. The sum for the EFFECTIVE column represents the CPU time 
consumed for all LPARs. The sum for theTOTAL column
represents the CPU time consumed for all LPARs and PR/SM
management time.

The two columns titled LOGICAL PROCESSORS show the average
percent of logical CPU busy based on the number of logical CPUs 
assigned for EFFECTIVE time and the TOTAL time. The calculation 
is simply: EFFECTIVE time in seconds divided by the NUMBER OF 
LOG PRCRS (logical processors) divided by the length of the
interval in seconds (this all multiplied by 100 to obtain a
percent).

Looking at PROD, the EFFECTIVE time in seconds is 47137.659 and
the interval in seconds is 28,800. The average CPU busy would 
then be: ((47137.659 /4) /28800) * 100 = 40.9181 percent. The 
EFFECTIVE percent, not the TOTAL percent, will be shown on the 
type 70 CPU Activity data (the RMF CPU Activity Report) for 
non-dedicated LPARs. This EFFECTIVE time is also the one used
in RMF Monitor II data and RMF Monitor III displays. 
(If you're running another vendor's monitor, you'll need to 
check with the vendor to see which data is displayed.)

The rightmost columns titled PHYSICAL PROCESSORS show the percent
of CPU time as derived from the total CPU time for all CPUs and 
is a picture of the entire machine. In this example, these six 
physical CPUs represent a total of 48 hours available for this 
eight-hour period. This available time is divided into the 
EFFECTIVE time to get EFFECTIVE percent; divided into the TOTAL 
time to get TOTAL percent; and divided into the difference between
TOTAL and EFFECTIVE time to get the LPAR MGMT time.

Looking at these columns, you can see that the entire machine was
85.30 percent busy and 3.17 percent of that was directly attributable
to PR/SM. RMF still shows close to 100 percent for dedicated LPARs, 
so you'll still have to look at the CPU Activity data to find the 
"true" CPU busy for that LPAR. What I don't understand about some 
of the data that I've seen is that it appears that there is the 
same amount of PR/SM management time for dedicated LPARs as
there is for shared LPARs. This is contrary to what you might
assume based on IBM documentation and user's benchmarks.
So how do you use this new data? I would use the PHYSICAL
PROCESSORS EFFECTIVE percent to obtain each LPAR's use and the 
PHYSICAL PROCESSORS LPAR MGMT (TOTAL) to obtain the percent of 
the machine for PR/SM, graphing it as its own workload. Thus, 
this machine would appear to have three workloads (PROD taking
27.27 percent of the machine, DEVL with 54.85 percent and PR/SM
management with 3.17 percent). The PR/SM management time would 
include the PHYSICAL time and the overhead for each LPAR.
You can use the DISPATCH TIME DATA EFFECTIVE time to obtain the
actual number of seconds used by each LPAR if you prefer to 
report in seconds.

The LPAR MGMT time for each LPAR can be used to determine the 
effect of capping and/or changes in volume in PR/SM overhead. 
The low utilization effect (LUE) shows up easily with this reporting, 
since you'll see more PR/SM management time when the CPUs are lightly 
loaded. Is this all of PR/SM overhead? I'm afraid not. This data now
shows the overhead that is due to direct PR/SM management and includes
overhead due to LUE and capping. It does not include overhead due 
to the multiprocessing effect, such as cache contention or 
TLB/ALB flush. I love the data that this new APAR provides, but 
I'd hate for anyone to assume that it's reporting all of the 
overhead due to the processing of LPARs.

Device Activity

Here's some other great news! The device type, model and control
unit model are now available in the RMF records. Only the device 
model is reported, but the control unit model is located in the 
record and can be obtained. 

Figure 2 shows a sample RMF Direct Access Device Activity Report 
with the new field listed under the device type. The data will 
be much more usable in analyzing device response data and data 
set placement. Without cache, you want to ensure that your faster 
devices such as the 3390s are providing better responses
than the slower devices. You also want to make sure that the
highest activity is generally located on the faster devices. 
You'll also find the device type and model on the Page/Swap 
Data Set Activity Report (not shown).

You can also see in Figure 2 the addition of another column. The
column AVG DPB DLY represents the number of milliseconds of delay 
due to ESCON director port busy. The PEND time is now composed 
of four elements: AVG CUB DLY (control unit busy), AVG DB DLY 
(device reserved or head of string busy delay), AVG DPB DLY 
(ESCON director port delay) and channel busy delay. If you
have programs that calculate the channel busy delay, you'll need
to add this new field in. Additionally, you can use this column 
to determine if delays are being caused by the ESCON directors. 
It will help you determine if more directors are needed. A 
similar measurement, the percent DP BUSY (percent of time 
the director path was busy), is also provided in the I/O Queuing 
Activity Report that's associated with the logical control 
unit activity (not shown).

The Channel Path Activity Report has also added ESCON
information. The channel TYPE field now contains additional values. 
The types can now be:

o BY - byte multiplexor;
o BL - block multiplexor;
o CV - ESCON converter attached to channel;
o CN - ESCON channel; and
o CND - ESCON channel with ESCON director.

This information can be used to verify the configuration of ESCON
channels that you've assumed and can be used to determine the 
channel busy for each channel. You can use this to see if your 
work is fairly balanced, and you can track the amount of usage 
on each of the channels.

APPC/MVS

The APPC/MVS data shows up in two places in RMF. The CPU Activity
Report provides a breakdown of scheduled TPs similar to batch, 
STC and TSO. 

Figure 3 shows an extract from the CPU Activity data. The Workload
Activity Report will also show a reference to subsystem ASCH on 
the performance group periods. Of primary interest in the workload 
report is the addition of the transaction queue time. In Figure 4, 
you'll see the new Workload Activity Report. The rightmost column 
contains the TRX (transaction response time), the SD (standard 
deviation), the new QUE (input wait queue) time and the TOTal time
(sum of TRX and QUE). The input queue time represents the amount
of time that a transaction was queued to be scheduled to TP. You 
can use this to set service level objectives for TP conversations 
and to determine if excessdelays are occurring.

Workload Activity Enhancements

The Workload Activity Report in Figure 4 shows the enhancements
to the type 72 data as well as improvements on the report. 
You'll see several changes here. The outdated workload level 
and service objective have been removed. You can now retire your 
calculator (at least for a couple of calculations). The post 
processor has added the calculated TCB and SRB seconds (see TCB and
SRB)! Thank you, thank you, thank you, IBM! They've also added
the percent of a single CPU used by each performance group period, 
performance group and domain. For example, a job that is using 
two full CPUs would show as 200 percent. Now you can see at a 
glance how much of the machine your CICS region
is taking.

Block Paging

SRM in SP 4.2 added the capability of performing block paging.
This is a technique of grouping contiguous virtual storage 
pages that are accessed at the same time as a logical group. 
When one is paged out, they all are; when one is paged in, 
they all are. This can be optionally chosen by SRM when
managing certain address spaces or can be explicitly requested 
by the program. Block paging is designed to prevent overhead 
due to programs with very large working set sizes. It should 
result in less paging, less CPU time and shorter response times. 
On the other hand, SRM only instigates block paging when the
system is storage-constrained. Therefore, review of the amount of
block paging can tell when the system is getting overloaded and 
which performance groups have address spaces that are managed 
with this technique.

The RMF Workload Activity Report in Figure 4 shows block paging
activity by performance group period, performance group, domain 
and total system. The PAGE-IN RATES are pages per second. SINGLE 
refers to normal page-ins (non-blocked). BLOCK refers to the 
number  of pages paged-in as part of a block. You can also see 
block paging from expanded storage in the EXP BLK field. EXP 
SNGL refers to page movements from expanded to central.

Block paging is also seen in the Paging Activity Report in Figure
5. A new column has been added for page-ins using block movement. 
The column, NON-SWAP, BLOCK, shows the page-ins per second from 
auxiliary to central using block paging. An addition to this report 
can be seen in the lower left corner. RMF has added the following: 
AVERAGE NUMBER OF PAGES PER BLOCK, BLOCKS PER SECOND and PAGE_IN 
EVENTS (PAGE FAULT RATE). The page fault rate can be used to
determine the appropriate SRM value for MPL control, RCCPTRT, in
the IEAOPTxx member.

CSTOR Capping

The RMF Monitor II ASD (Address Space State Data) provides
another SRM-related field. Figure 6 shows a sample ASD display. 
The column called CS TAR (Central Storage Target) contains a 
zero if the address space is being monitored by SRM due to 
its high storage activity or usage, a blank if it's not being
monitored, and a target working set size for central storage if
it's being managed by SRM. Address spaces will typically only 
be monitored or managed if they have very large storage 
requirements on a storage or CPU-constrained system.

New Workload Data

Additional Workload Activity data can be useful in determining
expanded storage usage. As mentioned before, the report now 
contains page movements from expanded to central. Previously, 
the STORAGE field contained the total of central and expanded 
frames. In SP 4, these are now broken out with the number of 
CENTRAL, EXPANDed and TOTAL frames. The AVERAGE is the TOTAL 
divided by the AVG TRANSACTIONS. If there is an average of 
one or more address spaces during the period, the AVERAGE will 
be less than the total. If there is less than one address space, 
the AVERAGE will be less than the TOTAL. RMF has also added 
the ended transaction rate as transactions per second in the 
END/SEC field.

As mentioned in the APPC discussion, there is a new field called
QUE time. For JES, this QUE time is the time spent in the JES 
input queue waiting for an initiator. If you add this field to 
the TRX field (average response), you can obtain the turnaround 
time for batch jobs. This is reported as the TOT response time. 
Now, if you have a job class assigned to a specific performance
group, you can determine service level objectives from the RMF
data. This is certainly easier than processing the SMF data.
There are some occurrences, however, that will invalidate this.
For example, if the operator resets a performance group, the 
transaction count for the job will be reset and the response time 
will appear to be better than actually experienced. Also, if 
the job changes performance groups by coding the PERFORM
parameter on the step card, the transaction starts over and will
appear as a shorter transaction than the entire job.

Summary

As can be seen from these changes, RMF has added some very
desirable features. If you'll be converting to this release soon, 
you can start on a few items now. Consider how to report the 
PR/SM management time in your PR/SM environment. Identify your 
current performance groups and determine if you need more or 
less to take advantage of the new workload data such as expanded
storage usage, hiperspace activity and block paging. Determine if
any of your DASD reporting can benefit from knowledge of the device 
model and control unit mode. Determine whether you'll be implementing 
APPC/MVS soon to start considering APPC as a new workload and new subsystem.

/*
Was this article of value to you? If so, please let us know by
circling Reader Service No. 00.      