The Challenges of Implementing DFSMS By Ernie Ishman In April 1991, Geisinger System Services made a commitment to implement DFSMS. The months that followed presented many new challenges to the internal support staff. These included everything from dealing with DFSMS related problems to understanding some of the new functions DFSMS introduced to the system. As project leader, I kept a log of both expected and unexpected issues during the project. This article details some of our experiences. The article assumes a basic understanding of DFSMS and is intended as a comparison view of one shopsþ successes and pitfalls. Upgrade Maintenance The initial task of the project was to upgrade maintenance for MVS/ESA, which had been installed in October 1990 via CBIPO. That task turned out to be ongoing. An initial CBPDO brought us up to 9101 with two succeeding PDOs through 9104 and 9106. In each case, problems experienced firsthand were corrected. Since the software behind DFSMS is still maturing, this is an especially important point to keep in mind. For the most part, stability of DFSMS throughout the project was acceptable with most problems being more of a nuisance. An example of this came when an eight-character storage group was tried for the first time. Much to our surprise, it did not work. VSAM defines going to storage group SGBASE90 were not functioning. APAR OY40764 corrected the problem, but a desire to stay current on maintenance was born. Before discussing some of the technical details, let me mention some of the objectives that drove our desire for DFSMS. These included but were not limited to: strategic positioning; define and enforce DASD standards; simplify data allocation; automate control of storage resources; aromote device independence; exploit features such as data set level caching and PDS/E; and promote efficient utilization of DASD. For the project to be considered a success, these objectives had to be satisfied. The Team The project team included a DBA, an operations analyst, an applications analyst, a security analyst and two systems programmers. The bulk of actual work was performed by the systems programmers with other members of the team serving more in an advisory capacity. To keep the project moving along, the Custom Migration Support (CMS) offering from IBM was brought in to assist in all phases of the project. This primarily provided us with an unbiased look at the environment with detailed insight on how prepared we were for the conversion. In addition, the CMS team offered proven recommendations that helped streamline DFSMS configuration design, problem diagnosis and end-user education. After initial maintenance was applied to the system, the next step was to bring up a minimal configuration. This proved to be a transparent but milestone event. With a minimal config in place, the stage was set to phase in additional changes as each prior one proved successful. One concern we dealt with was the need to understand how some of the third-party products were going to interact with DFSMS. Since DMS was our DASD management software used in place of DFHSM, it was very important to take each step with caution. To insulate end users from initial problems, the internal support staff data was the first to get managed. This provided us with an opportunity to understand what DFSMS meant to the environment as well as shake out problems. At this point, successes and pitfalls seemed to come on a daily basis. A problem that appeared early was related to the small size of the early storage group. Initially, only two volumes were put under DFSMS control supporting selected permanent data sets and limited temporary work space requests. A couple of newly managed non-VSAM allocations existed with a UNIT=(SYSWK,3) coded. This related to three work volumes in the unmanaged pool. Although the necessary space was available in the managed pool to service the allocation, it failed because the request for three volumes was honored. That seemed a bit harsh to me and the others, as a fixing APAR OY32669 was issued. That changed the process to issue a warning message indicating the number of volumes requested may not be satisfied if actually required. Also somewhat of a surprise was the effect DFSMS had on space allocation requests in a mixed environment. We have both 3380 and 3390 devices and decided to use the 3390 for the base configuration track size default. Most SPACE keywords in our JCL had been coded to do either track or cylinder allocations. Most JCL also used the UNIT=esoteric allocation technique. Since DFSMS couldnþt make an assumption on what device types made up an esoteric pool, it used the base track size to determine space requests. In other words, DFSMS calculated a request as though it would end up on a 3390. Thus, a request of 10 cylinders would end up as 12 on a 3380. The way around this was to change space requests to more generic values such as kilobytes. Since thatþs not a trivial task, weþve accepted the oddity as long as there is mixed geometry. Some Relief From Overallocation Since overallocation was common on the 3380s, significant relief was offered when new space release attributes were introduced in the management class called þimmediateþ and þimmediate-conditional.þ These were available with PTF UY65041. It provided us with an opportunity to automate the release of space when JCL did not have the RLSE keyword coded. Since there is a potential need for space beyond the initial request, such as would be used in a DISP=MOD request, we decided to implement immediate-conditional. This meant space would only be released when a secondary allocation could be made available. This approach has turned out to be considerably more efficient than waiting until nightly space management cycles run to release space after the fact. The only problem we saw was associated with an ISPF exclusive enqueue. For data sets defined with immediate space release, ISPF was locking out other users after the data set had be edited. This was especially noticeable with PDS data sets when a single member was changed. No other members could be accessed by other users as long as the first person remained at the ISPF primary edit screen. Since itþs common to make a PDS member change and only back out to the edit screen, this proved to be quite a headache. Prior to the fix for this via APAR OY49132, a special management class to avoid space release was assigned to problem data sets. A productivity gain offered by DFSMS is the elimination of EDT gens. However, this should not be confused with an attempt to eliminate esoteric units completely. JCL containing an esoteric request will receive a JCL error if the esoteric request is removed from the system. To avoid this, no esoterics were removed from our MVSCP source. Instead, as all volumes for a particular esoteric were converted, the esoteric would be set to point to a low-use paging volume. This volume was established early on as the catchall pack for problems arising from various scenarios. Now and then a data set would show up on this volume indicating an allocation got through the system for a data set that would have otherwise caused a JCL error. One of our primary goals was to avoid unnecessary abends and this went a long way in accomplishing that. Eventually, all old esoterics pointed back to the paging pack just in case. Since DFSMS requires managed data sets be cataloged, I was surprised to see sporadic occurrences of uncataloged data sets. These generally turned out to be temporary in nature. Such things as a job abend or system crash allowed the situation to occur. Because of this, we periodically run a job that reports uncataloged data sets. They are then cleaned up. Unlike unmanaged data sets, option 3.4 of ISPF cannot be used to purge these uncataloged data sets. We found the ISMF DELETE line command will delete these data sets as it apparently issues DELETE NVR, which would be the other way of doing it. This also brought out another item of interest. DFSMS does not enter the management class ACS routine for temporary data so the system will not assign a management class to DSTYPE=TEMP data. Thus, a space management job based on management class was not picking them up. This should not be an issue in a DFHSM environment because this daily space management, by default, cleans them up. VSAM Alternate Indexes While discussing data sets that do not go through an ACS routine, itþs important to note VSAM alternate indexes. Although itþs clear in the manual, we found out the hard way that jobs trying to assign a management or storage class to an AIX will fail. For good reason, these data sets are automatically managed in the same manner as their related cluster. This keeps the entire sphere under control within the same storage group. A welcome feature of DFSMS for the internal support staff was being able to eliminate involvement with large production data set placement requests. We basically opened up the entire DASD pool to all allocations with the expectation that DFSMS and SRM could do a better job than we could. One system-related problem appeared related to this. DFSMS assigns volumes for new allocations when a job is submitted. In some cases, this included very large requests for space. If the step actually needing space did not execute for a period of time, there would be a chance space was no longer available on the selected volume. When DFSMS switched the allocation to a different volume, an abend occurred. This was fixed via APAR OY47967. We chose to exert control over who could allocate large test data sets. Two approaches were taken: One came in the form of a check in the storage class ACS routine for data sets requesting over 300MB primary. If the user was not privileged, the request was failed with a note to contact internal support for assistance. This too was a welcome feature as there was no enforcement of this unwritten standard in the past. The second approach dealt with allocations that could potentially grow above 300MB via secondary extents. An ACS storage class exit was written to detect these and issue a warning back to the joblog and syslog. The exit was necessary because it is not possible to write to the log via an ACS routine. We wanted to review these periodically to detect potential space abuse. The exit was based on some code found in member IGDACSSC of IPO1.SAMPLIB. The code as shipped was implemented during early ACS code testing for debugging purposes. I recommend taking a look at it. We expected installation of DFSMS would provide opportunities to use the new PDS/E data structure. It turned out that some ugly messages were issued when an attempt was made to define a PDS/E. That was corrected via installation of PTF UY50879. This was yet another indicator of the importance of current maintenance. I also wanted to start using the new DCOLLECT option in IDCAMS, but found it was not available until UY90555 was applied. Periodically, programmers coded up data set allocations using the EXPDT keyword. Since we did not use the value in any space management processing, I was glad to see a way to override it via DFSMS. By establishing a retention limit of zero in the management class, EXPDT values were ignored. A message indicating the value was ignored went back to the joblog. This has eliminated the need to code PURGE on DELETE requests that sometimes occurred at very inconvenient times. Generic Entries in APF A new feature for system programmers was the ability to make generic entries in the APF list for managed data sets by leaving off the normal volser value. That informed the system to use normal catalog searching to find and authorize the data set at IPL time. This was an especially useful enhancement in relation to our disaster recovery plan. We never knew where an authorized data set would end up during the recovery scenario. Now, if it is managed, thereþs no need to know. The feature is available via APARs OY26695, OY27602, OY28919 and OY41408. Since our environment had mixed DASD geometry in the storage group pools, there were a few isolated problems when files were converted to DFSMS. CICS complained for good reason when some journals got moved from unmanaged 3380s to managed 3390s. This was due to the files being preformatted using certain characteristics of the geometry of the device they were on. The larger track size left allocated space that was not usable. Unfortunately, the problem didnþt show itself until CICS actually tried to write to this area. The regions had to be brought down to correct the situation. The reason for this type of problem is quite apparent, but itþs an example of how little things can slip through if youþre not careful. The problem is typical of others you may have where DSORG PS or DA files are preformatted by specialized utilities. There was one instance where the release of a software product did not support 3390s. It was corrected by an upgrade to the product. Checking with the vendor is a must before moving data associated with a product utilizing specialized utilities. After trying different approaches with management and storage class constructs, the list was stabilized as the project neared completion. Since management class was now controlling backup and archival of managed data, daily scrutiny was a must. Since DMS was the management tool, my experiences would not apply to everyone. Suffice it to say that backup and archive management was not always what we expected. It took time to understand which management class parameters related to various DMS terminology. As our knowledge base expanded, it was pleasant to see such items as data set level caching via STORCLAS was everything it advertised. To illustrate our implementation of these two constructs, a chart summarizing the basic differences internal to each is shown in Figure 1. Attributes that are the same for respective classes are listed separately. I will not illustrate our data classes because little time was spent on them. We saw few significant benefits without a more strict naming convention. The storage groups were simply divided by 3380 and 3390 geometry with one thrown in for the VIO pool. Although there is considerable use of DB2 and CICS, weþve been able to avoid implementing guaranteed space. Initially, there was apprehension that poor data set placement could cause response time problems; that never materialized. It seems as long as there are sufficient volumes, DFSMS does an adequate job on performance. The only data not being managed at this point are DB2 BSDS, DB2 logs, selected CICS journals, paging data sets, master catalog volume, IPL volume and a third-party IPL critical data sets volume. These data sets primarily are kept unmanaged for recovery purposes. Integration into DFSMS is expected as time and disaster recovery techniques permit. Although the efficiency of VIO has been around for some time, the ability to manage the resource was limited. DFSMS takes a big step forward in getting a handle on VIO usage. I expected and received significant results when VIO was implemented for selected temporary data sets. Jobs using small, high reuse data sets saw as much as a 42 percent decrease in execution time if the job was not set up to use VIO previously. Good examples were jobs doing a DB2 precompile, CICS translate, COBOL compile, linkedit and DB2 bind all in one TSO batch step. A couple of problems surfaced with products unable to handle data sets moved from DASD to VIO. Most problems were handled by fixing a vendor PTF or by excluding the product from VIO via ACS. One curious problem came about because of the useful function STOP-X37 was providing to avoid out of space conditions. Since STOP-X37 did not get involved in VIO x37 abend processing, any jobs previously relying on help for a data set on DASD started to abend if DFSMS moved it to VIO. Another item applicable to STOP-X37, which should apply to other comparable products, relates to the recatalog feature. This feature allows files to be recataloged to the new volume when a duplicate data set gets created. For good reason, this is not applicable under DFSMS. If there are jobs relying on this to go to normal completion, a JCL error will occur. Most Difficult Problem By far, the most difficult problem we had to debug was not even recognized as a DFSMS problem. To facilitate testing, we run a separate MVS system that communicates with production MVS via a virtual CTC adapter under VM. One day, after making various changes unrelated to DFSMS, the test system was taken down for an IPL. Much to our surprise, the VTAM link via the CTC was not able to reconnect after the system came back up. A thorough review of the changes revealed nothing to cause such a problem. Since DFSMS was not involved in the immediate changes, it was not a suspect. To make a long story short, a minor change had been made to the storage class ACS routine about a week prior. The change unknowingly allowed DFSMS to attempt management of all devices including the CTC. It just so happened when VTAM tried to activate the CTC, it was receiving a sense code indicating already in use. This was due to the inadvertent involvement of DFSMS. The moral of the story is to make sure the storage class ACS routine has a filter list that is very selective on devices DFSMS gets its hands on. An unexpected benefit of DFSMS came from a change in the way IDCAMS BLDINDEX processing works. For BLDINDEX to function in a DFSMS environment, PTF UY65152 was applied. This eliminated the need for IDCUT work files. These files had unique requirements that did not fit the DFSMS environment. In place of its sort logic, IDCAMS now called the standard sort product on the system. After applying this PTF, a job performing a BLDINDEX of 1.5 million records decreased in time from 64 minutes to 10 minutes. EXCPs dropped from 189,000 to 9,000. Comparable decreases were noticed throughout the system for BLDINDEX jobs. Obviously, this caught a lot of attention. Two unacceptable situations associated with VSAM file processing required JCL changes. An infrequent technique used to define VSAM files involved use of the FILE(xx) keyword. The xx DD in the JCL referred to in the DEFINE would contain only UNIT=xxxx, DISP=OLD and VOL=SER=yyyyyy. Under DFSMS, this caused problems since the referenced volume went away and there was no managed DSN= to allow use of a DUMMY storage group. Since there was no need for the FILE(xx), they were removed when found. The other problem dealt with the combination of a DELETE, DEFINE and REPRO all within the same IDCAMS step. The REPRO used an OUTFILE(xx) to refer back to the JCL for the output data set that had just been deleted and redefined. Under DFSMS, the DELETE of a data set followed by a DEFINE usually moved it to another volume. In turn, when the REPRO tried to process OUTFILE(xx), which had moved, an abend occurred. All occurrences of this technique had to be change to use an OUTDATASET in place of OUTFILE. That forced a search of the catalog for the current location of the data set when REPRO was invoked, instead of relying on where it was when the JCL was interpreted. Something I had hoped to make greater use of was the ability to define VSAM files via JCL. Unfortunately, limitations allow its use only for very basic or temporary VSAM allocations, which is probably all it was intended for. Clusters requiring attributes such as REUSE, SPEED or ERASE must still be defined via IDCAMS DEFINE. If youþve ever looked at the run-time difference between a large file loaded with SPEED as opposed to RECOVERY, youþll agree this is not trivial. Alternate index definitions are also not possible. Other options such as FREESPACE, CISIZE, IMBED and SHR not directly available in JCL can be accessed via DATACLAS, but that required a commitment to DATACLAS. We decided it was best to continue recommending IDCAMS DEFINE for VSAM file allocations to avoid any confusion. We did inform everyone that a DISP=(OLD,DELETE) for a VSAM file in JCL would delete the file. That was ignored in earlier DFP releases. The enthusiasm to begin using system determined blocksize, which was a new feature of DFP, uncovered a problem with model DSCBs. Some programmers were changing allocations to remove the BLKSIZE keyword so SDB would be invoked. Unfortunately for the GDGs, that was causing DFP to revert back to the blocksize of the model DSCB. This was not always equal to the original BLKSIZE in the JCL. There were only a few model DSCBs on the system and they were there only to satisfy the GDG requirement, not to provide actual data set attributes. Since models are no longer required under DFSMS, the solution was to remove the model at the same time as the BLKSIZE. It took a few abends before everyone got the message. VOL=REF An interesting scenario involving VOL=REF was encountered. Anytime a file was defined using the VOL=REF technique, normal DFSMS processing was bypassed with the file taking on the class constructs of the referred data set. This was causing some data sets to get managed even before they were included for selection in the ACS routines. The opposite was true when an otherwise managed data set referred to an unmanaged data set. Fortunately, the use of VOL=REF was very limited and was not an issue after everything was converted. As with any project of this magnitude, there were considerably more challenges than I could possibly detail. Many were shop-specific, related to such items as homegrown software and overriding management concerns. You can be sure unique experiences are awaiting anyone involved in such an undertaking. I have tried to highlight the items of value to a general audience. In conclusion, I can say that DFSMS has had a very positive effect on our environment and has been worth every bit of time weþve taken to explore, understand and implement it. /* Was this article of value to you? If so, please let us know by circling Reader Service No. 78.   Figure 1: Basic Differences of Implementing Two Constructs MGMTCLAS Name Primary days Expire after x days Partial release MCBASE 30 no limit cond-immed MCBASEP 100 no limit cond-immed MCPURGET n/a 1 cond-immed MCPURGEP n/a 2 cond-immed MCNOMIG never no limit cond-immed MCNORLSE 100 no limit no Expire non-usage - no limit (archived data sets are kept until?) Retention limit - 0 (ignore EXPDT) Level 1 days - same as Primary (Level 1 not implemented) CMD/auto migrate - both # primary GDGþs - 2 Rolled off GDG - expire Backup frequency - 0 # backups (exists) - 7 # backups (deleted) - 1 Retain days (only) - 50 Retain days (extra) - 50 Auto backup - yes STORCLAS Name Dir resp Dir bias Seq resp Seq bias Guar. space Comments SCBASE 900 - 900 - no default SCFASTR 10 R 10 R no force fast read SCFASTW 10 W 10 W no force fast write SCGUARAN 900 - 900 - yes not used SCNOSMS 999 - 999 - no bypass SMS via ACS SCNOVIO 900 - 900 - no bypass VIO SC80 900 - 900 - no force 3380 alloc SC90 900 - 900 - no force 3390 alloc Availability - standard Sync write - no