CMCDONOU.JUN 
 
VSAM TUNING MADE SIMPLE!  SORT OF....  
PART I 
 
by CRAIG R. McDONOUGH  
 
Mr. McDonough has been in data processing for 18 years, the 
last seven as a DOS/VSE systems programmer.  Prior to 
becoming a systems programmer, he gained experience as an 
applications programmer, systems analyst and independent 
consultant. 
 
Introduction  
 
Of all the DASD access methods that IBM supports, VSAM is 
perhaps the most responsive to (and dependent upon) 
"tuning."  Tuning is defined as the optimizing of DASD 
performance and/or utilization through the manipulation of 
the parameters available when defining or using a file. 
 
The main tuning options available can be analyzed and 
optimized while an application is being planned or, to a 
certain extent, as a "retrofit" to an existing application 
During a maintenance or designed change.  The examples 
chosen here and the specific recommendations given will 
primarily refer to VSE/VSAM files, but the techniques are, 
in most cases, just as applicable to OS/VSAM. 
 
Tuning Parameters  
 
The items to consider for VSAM tuning are:  
o Logical record size;  
o Access pattern;  
o Control interval size;  
o Control area size;  
o Imbedded free space; 
o Index options; and  
o Bufferspace. 
 
The first four will be covered in this article, Part I of a 
series to be continued in future months.   
 
Logical Record Length  
 
The logical record length of the application record is very 
application-design specific, and as such does not lend 
itself to external manipulations.  The exception is to 
ensure that enough extra space is allocated within the 
record so new information fields can be added to the record 
layout during a retrofit.   
 
Access Pattern  
 
The access pattern, the sequence in which successive records 
are retrieved or added to the dataset, is also application- 
specific, but is more amenable to manipulation for 
performance.  The main categories for access pattern are: 
 
o Truly Random - no relationship between successive requests 
at all.  A license-number database accessed by a law 
enforcement agency servicing requests from officers in the 
field is a good example; 
 
o Random Batch - The records are retrieved in random order 
within a certain range of keys.  An example would be a 
payroll dataset in sequence by employee number within 
department in which a clerk processes transactions by 
department and randomly by employee within that department; 
 
o Sorted Batch - The records are retrieved by key, but 
successive requests, within a batch, are always in key 
sequence.  In the payroll file above, the clerk processes by 
ascending employee within a department before processing 
requests for a new department; 
 
o Sequential - Access is always by entry sequence of the 
records in the dataset and no records are skipped. The 
desired order of access is by the order that the records 
were originally placed in the file.  Typically, the entire 
dataset is perused each time the file is opened; 
 
o Skip-Sequential - In this access mode the records are 
retrieved in physical key sequence, but the starting point 
within the file is variable.  Once again, in the mythical 
payroll file, the clerk reviews the records for all of the 
employees within a given department. 
 
Control Interval Size 
 
The control interval, referred to as a "CI", is VSAM's unit 
by which it transfers data between DASD and main storage.  
It is analogous to the "block" concept in other access 
methods.  However, unlike block size, CI-size is not limited 
to an even multiple of the logical record length; the CI- 
size selected for a file is completely irrelevant to the 
actual processing of the dataset.  VSAM will present, except 
in very rare instances, discrete records to the application 
program without the application having to be concerned with 
the control interval size.  A control interval is always 
made up of an integral number of physical records, whose 
physical record size is dependent upon both the effective 
CI-size and the device type being used.  (For FBA DASD, the 
physical record size will always be 512 bytes, regardless of 
CI-size.) 
 
Indeed, CI-size is probably the most sensitive single item 
that is available for optimization in a VSAM dataset.  It 
directly affects the processing efficiency, amount of main 
storage required to process the application (called the 
"working set"), and the utilization of DASD space for good 
or ill. 
 
A file's CI-size is selected under the following broad 
constraints, which are enforced by VSAM: 
 
o Allowable Range - it must be between 512 bytes and 32,768 
bytes.  If less than 8K bytes it must be a multiple of 512 
bytes, and if greater than 8K bytes it must be a multiple of 
2K bytes.  If you define a CI-size that violates this rule, 
VSAM will increase it up to the next multiple -- i.e., a CI- 
size of 800 bytes will be increased to 1024 bytes, and a CI- 
size of 9,216 will be increased to 10,240.  Any extra space 
in a CI that is allocated this way may not be available for 
storing records, if the RECORDSIZE is larger than the 
difference between the selected (in the IDCAMS "DEFINE" 
command) and the actual CI-sizes (CI-size generated by 
VSAM); 
 
o Recordsize versus CI-size - The largest RECORDSIZE in a 
non-spanned file can be no larger than the CI-size less 7 
bytes.  The 7 bytes are required for control information 
required by VSAM (however, see below about the actual space 
consumed in a VSAM control interval); 
 
o Spanned Record - The space available within a SPANNED 
record's individual control intervals is the CI-size minus 
ten (10) bytes -- 4 bytes for the CIDF and 3 bytes for the 
record segment's RDF and 3 bytes for the RDF that holds the 
CI's level check.  A SPANNED record is one whose RECORDSIZE 
may be larger than a single CI.  An RRDS cluster cannot be 
defined as SPANNED; 
 
o Record Count Within CI - As many logical records as 
possible will fit in a single control interval ("CI").  In a 
non-spanned VSAM file, if the remaining space in a CI is 
less than the maximum record length specified in the file 
definition, no record will be written to the CI, even if 
that record would fit.  In a spanned VSAM file (a file where 
a logical record may span control intervals), only space 
remaining in the last CI of a record is not used, even if 
the space would be sufficient for a smaller size record; 
 
o Default CI-Sizes - If you do not specify a CI-size when 
you define a cluster, VSAM will allocate a CI-size for you 
(usually not what you would want, though).  The rules are 
simple -- if you specified the "RECORDSIZE" parameter in the 
DEFINE, and the size of the record permits, VSAM will assign 
a CI-size of 2048 bytes [2K].  If no RECORDSIZE was 
specified, VSAM will assign a 4096 byte [4K] CI.  Also, if 
the file is a KSDS dataset, VSAM will default to a 512-byte 
CI-size for the index component.  If the default CI-size is 
too small for the record size specified, VSAM will allocate 
a CI-size at the next-allowable multiple (multiples of 512 
bytes if the data component RECORDSIZE is less than 8K, and 
multiples of 2K if the record size is greater than or equal 
to 8K); 
 
o Index CI-Size - The index CI for a KSDS dataset must be no 
larger than 8K, and must be a multiple of 512 bytes; 
 
o Forced Rounding - If you specify an improper multiple of 
512 or 2048 bytes, VSAM will round the CI-size used up to 
the nearest allowable multiple.  VSAM will not issue any 
message indicating that the CI-size you have selected has 
been overridden. 
 
The primary access pattern of an application is very 
important when selecting which CI-size size to use, as this 
parameter (CONTROLINTERVALSIZE) will have a very direct 
impact on the performance of your application.  For 
sequential or skip-sequential access, larger control 
intervals are beneficial.  Since a larger CI-size allows a 
greater number of logical records to fit into each CI, fewer 
control intervals will need to be transferred between DASD 
and main storage to process a set number of records, thus 
reducing I/O time. 
 
However, for a randomly searched file, unless the access 
pattern results in a high "hit ratio" within very tight key 
ranges, a smaller CI-size is preferable.  The larger CI 
would, in this case, be causing access to auxiliary storage 
to retrieve records within the CI that will not be needed, 
as each CI would only have a few (or just a single) record 
that will be referenced, and the larger CI would be under 
VSAM exclusive control, potentially tying up records that 
another user (in the online environment) wants to access.  
Also, consider the potential time wasted reading and writing 
these control intervals to and from storage, when the 
records actually needed are such a small percentage. 
 
For a KSDS file, a larger CI-size allows for more efficient 
distribution of the free space (after the file is loaded, as 
more records are added, more records will fit into each CI, 
thus requiring fewer CI-splits, and fewer control area 
["CA"] splits) and fewer index records are required, as 
there will be fewer control intervals to point to.  A larger 
control interval size is also beneficial for a randomly 
accessed file if the retrieval is by pre-sorted input keys, 
or the access is randomly within a "tight" key range, as 
VSAM will, if possible, reference any CI in its in-core 
buffers before reading from DASD. 
 
CI-size will affect main and auxiliary storage requirements 
also: 
 
o As RECORDSIZE increases, you may need larger control 
intervals to hold the records; 
 
o Poor choices for CI-size affect DASD utilization - i.e., a 
150-byte record will only fit 13 same-length records into a 
2K CI for a KSDS cluster, thus wasting 88 bytes: (2048 - 
10 - 1950 = 88), where: 
 
2048  --> control interval size; 
  10  --> control information [1 CIDF and 2 RDFs];            
1950  --> space required for record storage [13 * 150];       
  88  --> space remaining within the CI. 
 
This 88 bytes represents 4.3 percent of each CI.  Raising 
the CI-size to 4096 allows 27 records of this same 150-byte 
length to fit into the control interval, with an excess of 
only 36 bytes (4096 - 10 - 4050 = 36), where the values are 
calculated as above for the 2K CI.  These 39 bytes represent 
less than a one percent waste within the CI.  For a 50,000 
record file, a 2K CI-size (which requires four 512-byte FBA 
blocks) will require 3847 CIs (50,000/13 = 3846.15), or 
15,388 blocks.  These same 50,000 records in the 4K CI 
(which uses 8 FBA blocks) will require 1582 CIs (50.000/27 = 
1851.8), or 14,816 FBA blocks.  If the 15,388 block 
allocation for the 2K CI is rounded up to 15,392 (an even 
multiple of 8 FBA blocks), this CI-size would hold 50,024 
records.  Used for a 4K CI, these same 15,392 FBA blocks 
hold 51,948 records.  This represents a 4 percent increase 
(1,924 records) in the same space.  Actually, due to the 
MAX-CA and MIN-CA rounding that VSAM will perform, these 
allocations (15,388 and 15,392 blocks) will actually be 
15,438 blocks -- an allocation that represents 249 MIN-CAs 
for a 3370 FBA DASD; 
 
o As control interval size increases, the allocated buffer 
space has to expand to hold the larger control interval; 
 
o For an indexed cluster, if the data component CI-size is 
small, and there are many potential data CIs in a single 
control area, the default (or selected) index component's 
CI-size may not be large enough at the lowest level (the 
sequence set) CI to address all the potential data CIs in 
the control area, forcing VSAM to leave some of the control 
intervals in the control area empty, thus wasting the unused 
space.  If the sequence set index record cannot address all 
the CIs in the control area the sequence set record 
references, the CIS cannot be allocated for use for records, 
but VSAM has already allocated the space for this control 
area.  (See the section "Index Options" in next months 
installment to this series.) 
 
Control Area Size  
 
A Control Area ("CA") is the unit of DASD storage that VSAM 
will allocate and preformat when loading or expanding a 
file.  With the advent of the linear addressing scheme for 
the FBA DASD devices (3310/3370) the former practice of 
allocating space by tracks and cylinders to optimize 
performance is no longer as conceptually clear, even though 
the techniques and the end result are the same.  The terms 
"MIN-CA" and "MAX-CA" are now used in place of the terms 
"track" and "cylinder", respectively, when discussing VSAM 
space allocations.  Thus, even though FBA DASD is addressed 
in terms of 512-byte blocks, these devices are physically 
configured as track and cylinder. 
 
VSAM will always allocate space in multiples of MIN-CA (for 
FBA a MIN-CA = 62 blocks [31K bytes]) up to the MAX-CA for 
the device (for FBA MAX-CA = 744 blocks [372K bytes]).  A 
control area is always made up of an integral number of 
control intervals, but performance is enhanced if a whole 
number of control areas will fit into a single MAX-CA, due 
to the command chaining to read an entire cylinder at a 
time.  VSAM will always allocate storage on MIN-CA and MAX- 
CA boundaries (allocations may split cylinders but never 
tracks).   
 
The size of the control area is indirectly significant for 
sequentially organized files (RRDS/ESDS) due to this command 
chaining, but it has a direct impact on indexed files.  The 
CA-size selected represents one of the primary 
considerations for how VSAM will allocate its indices, and 
how free space (and thus the incidence of CI and/or CA 
splits) is used. 
 
For an indexed file, VSAM also views a control area as space 
occupied by the number of control intervals that can be 
addressed by a single SEQUENCE SET index record.  The 
sequence set record is that index record that holds the 
high-key marker and the physical location to address each CI 
in the CA. 
 
For an indexed file, the index record must be large enough 
to address all the control intervals in a control area; 
thus, for sequential/skip-sequential access of an indexed 
file (or an alternate index [AIX]), if there are more 
control intervals in the a control area, fewer index records 
need to be read, also, larger control areas will generally 
result in better performance and space utilization.  Because 
the VSAM catalog keeps track of its allocations, no control 
information fields (such as the RDF and CIDF in the control 
interval) are required in the control area itself. 
 
Control area size cannot be directly specified, but the 
quantity that VSAM uses can be influenced by the choice of 
options that you can define when you describe the cluster 
via IDCAMS "DEFINE."  VSAM will choose, for a given file's 
control area, the smaller of the MAX-CA for the device, the 
primary space allocation or the secondary space allocation 
for the cluster. 
 
As mentioned above, VSAM will preformat space by control 
areas when loading or extending a file.  This process 
consists of VSAM writing end-of-file records across each 
control area it allocates before writing any data or index 
control intervals into the freshly-allocated control area. 
Thus, in theory, if the application should fail after 
starting to load or extend a dataset, a problem program 
could read to the end-of-file and resume loading/extending 
the dataset from that point.   
 
This preformatting is costly in terms of I/O time, as the 
control area is effectively being written twice -- once when 
preformatting and again on writing actual data records.  If 
the cluster is to be initially loaded by a utility (DITTO, 
SORT/MERGE, VSAM REPRO), it is simpler, in the event of an 
ABEND on loading, to delete and redefine the cluster's 
catalog entry and reload the dataset from the beginning.  In 
this case, the preformatting of the area on disk does not 
help.  Indeed, it can be a serious performance bottleneck.   
The process of preformatting can be bypassed by specifying 
the option "SPEED" in the DEFINE for the cluster.  This 
option applies only to the INITIAL LOAD, not when the file 
is being extended.  "SPEED" is not the default attribute, 
and must be specified if desired. 
 
/*  
2770 
