Pointer Pointers
Part I: Finding and Repairing Broken DL/I Pointers
By David Balch

David Balch has 25 years of experience in programming, 
19 of which working with CICS and DL/1. He is founder and 
president of A Few Good People, Inc., a software company 
based in Crest Park, Calif. that develops and markets several
file editing products for the CICS environment.

This article examines DBDS from A Few Good People, Inc.
This is the first installment of two articles focusing on the
problem of IMS and DL/I data bases that contain broken pointers. 
In this first installment, we examine the problem in general; 
how it is discovered; what causes the ituation to occur in 
the first place; and some available options for correcting it. 
The second installment explores in detail the specific way in
which DBDS, a software product from A Few Good People, Inc., can
be used interactively online to find and repair data  bases with 
pointer problems. Introduction Pointer problems in DL/I or IMS/DB 
data bases are a thorn in the side of DBAs everywhere. Unfortunately, 
there are no simple answers to these complex problems because 
of the incredible variety of data bases and environments in
which they occur. This article is intended to familiarize you, 
in general terms, with the problem and offer some general 
approaches to correcting it. It is not intended to be the 
definitive discussion of the subject. It is assumed that you 
are already familiar with DL/I terms, concepts, coding and 
utilities. Please note that for purposes of this document the 
terms DL/I and IMS/DB will be considered interchangeable.

The First Sign of Trouble

If you are unfortunate enough to experience a DL/I or IMS 
pointer problem, you will not need to be Sherlock Holmes to 
discover it. The problem will rear its ugly head in grand 
fashion complete with lights and sirens, and usually at the
worst possible moment. If it is discovered online, it may be
first noticed by you while testing, but this isn't likely 
since you may actually be able to fix it without anyone 
noticing or being inconvenienced. Or, it may be first
noticed by an operator due to a console message from DL/I, but
this is even less likely since such messages are often lost in 
the mire of messages that bombard the console regularly. More 
than likely, it will come as a phone call from a user who 
just got a weird message and wants to know what it means.
A typical error message may be in the 700 or 800 range, with text
to the effect of: invalid segment code or buffer handler error. 

For CICS, look for an abend code of ADLA in MVS systems, and 
then look on the CICS message log for a more specific error 
message number. In DOS/VSE systems, CICS abends to watch
for will be Dnnn (where nnn is the message number in the 700 or
800 range mentioned above). If a particular problem is destined 
to be discovered in a batch run, you may, if you're lucky, 
get a phone call in the middle of the night about an abend
that occurred. It would be ideal, however, since you could then
intervene and stop further processing before other data bases 
get involved and the problem multiplies. The most likely 
scenario is that you will get a call from a user that her/his 
nightly or weekly or whatever job didn't finish. If you're really
unlucky, it will happen during a regularly scheduled backup and
no one will notice for several weeks, if ever. The unlucky 
scenario continues with the users not hitting the affected part 
of the file, and the error goes unnoticed for several weeks or 
even months. If you have IBM's Space Management Utilities (SMU) 
and run it on a regular basis, it will find any pointer errors 
that may be lurking in your data bases.

It will even give you the RBA of the segment that contains the
error, which will be useful later when you go to fix it. 
Note that a pointer error will not be found during regular
backups if you use image-copy processing since the pointer chains 
are not followed at that time; pointer chains are, however, 
followed during unload-type processing (i.e., forward pointer 
chains). If your data bases are small enough, you might want
to consider using unloads for your backups instead of image-
copies for that very reason. The good news is that a bad forward 
pointer will cause the unload to abend and, hopefully, be 
noticed; the bad news is that if an abend does occur, you 
won't have a good backup. Worse, if an abend occurs and it isn't
noticed, you will not have a good backup until it is noticed and
fixed. Definitely not good. 

Causes and General Approaches

Pointer problems occur for a variety of reasons, some known and
some not known. There is, unfortunately, no cut-and-dried 
answer to the question. In some cases, the problem will appear 
after a dramatic event such as a system crash or a batch 
application abend. In a situation like that, it is reasonable
to associate the problem with the event. If you have designed
your backup/recovery procedures properly, you can probably 
avoid the entire problem by recovering the data base in a 
manner consistent with the problem that occurred.
For example, running CICS emergency restart after a CICS crash
will, theoretically, remove partial updates to the data base 
that may contain bad pointers. If, however, the problem that 
caused the crash also corrupted your buffers in such a way 
that they were physically written to the file before the
crash, and the corruption occurred unknown to DL/I, emergency
restart will not save you because the software does not know 
to backout those changes.

If a batch application abends, you can usually recover using 
the appropriate DL/I-supplied utilities and remove all updates 
made during the run, then fix the problem and rerun the job. 
This is assuming, of course, that you were logging your updates 
or there was a backup of the file(s) made before the run.
A big problem with batch abends is that they can go unnoticed or,
even if noticed, no backout is run to recover the files involved.
Subsequent jobs run against the affected file(s) can create a 
pointer problem that can make adults cry.

Sometimes a problem can be created in the process of fixing one.
There are many war stories of operators who accidentally use 
an incorrect version of backups or change accumulation files 
during recovery operations. If several data bases are linked, 
either logically or physically, and they are restored
as of different dates, havoc can result. Finally, the problem 
may just appear, as if by magic, for no apparent reason.
This can happen when a problem occurs and goes unnoticed or
unremedied for a long period of time. It is also possible that 
a system or application bug caused a problem but did not abend 
and was then fixed either accidentally or on purpose. The pointer 
problem has thus been created and is just waiting to be discovered. 
It may not appear for months, making it almost impossible to
trace. Why wouldn't it appear immediately? Because application
peculiarities may cause the bad pointer to go unused.

For example, the broken pointer may be located in the record of
an inactive customer whose data is only accessed annually in 
January. If the problem occurs in February, that bomb will be 
ticking away for 11 months before it goes off! Or how about a 
broken pointer from a root to a history segment? There may be 
no need to access that history for years, in which case you'll
probably discover it during a reorganization before the
application has a problem.

Options for Correcting the Problem

There are a number of options available for correcting a broken
pointer. We will discuss several options here, including:

o use standard recovery utilities;
o reorganize the file(s) using unload/reload utilities;
o write special application programs;
o ZAP the file by calculating physical disk addresses of affected
  pointers; and
o use DBDS to find and zap bad pointers online.

Regardless of the option that you choose, it's always a good 
idea to back up the damaged file before you do anything. Why 
back up a damaged file? Because it's possible that your recovery 
attempt may create a worse problem than you started with -- data 
checks, head crashes, power failures -- you just never know. 
If the damaged file is backed up, you can at least assure yourself
that the current situation is the worst one that you'll have to
deal with because if you make the problem worse, you can always 
restore the backup and start over. If you are aware of a situation 
that may have caused a problem and you catch it soon enough, 
you have a pretty good shot at recovering, provided you did
your homework and have the appropriate backups and copies of
interim changes. The various utilities are too complex to go 
into here, but suffice it to say that this may be a viable 
option. The bad news is that the affected files are not 
available to the user during recovery and that, for large
data bases especially, it may take a long, long time to run. 
The good news is that your recovery will be absolute.

You can also completely reorganize the damaged file(s) using
unload/reload utilities. Unfortunately, the pointer problem 
itself may preclude you from unloading the file, thereby 
eliminating this option completely. If it doesn't, it will 
resolve your problem without question, provided you do everything
right (including reloading indexes and logical relationships).
A variation of the batch utility option is the one-time quick 
and dirty program that unloads or "copies" the data base and 
intentionally skips the record(s)/segment(s) containing the 
problem. As we all know, quick-and-dirty programs are never 
quick and always dirty, but there are other considerations.
This is very time-consuming, and must be tested and retested.

If you have a very large data base, we could be talking days to
accomplish the job. Depending on how you design the solution, 
you will have to copy control blocks with slightly different 
names, or determine DL/I's utility unload format so you can use 
your output directly into DL/I utilities to rebuild the data 
base. And, heaven forbid, you get almost all the way through
the data base with your unload and another pointer problem 
crops up -- then what?

Recompile your program with an additional fix for that part of
the data base and start over. This cycle will continue until 
all of the bad pointers are found: run the program until it 
abends, determine where it abended, change the program, recompile 
and run the unload again. Another option is to manually find 
and ZAP a bad pointer using utilities such as DITTO or an 
equivalent. (If you have ever done this, reading this section
may be too painful a reminder and you will probably want to skip
it.)  Manually fixing a broken pointer consists, generally, of 
the following steps:

1) You must first determine the RBA of the segment containing the
   bad pointer. There are several ways to do this, all of which 
   are extremely aggravating and tedious:

o You can usually find an RBA by wading through a dump.
o If you can't determine the RBA of the exact segment you want,
  you have to use whatever RBA you can find and then follow a 
  chain to get to the one you want. In this case, you have to 
  do steps two and three below for each segmentin the chain in 
  order to find the next segment in the chain.
o If you know the key of your record, you might try a different
  approach.

For HIDAM files, print the index record using IDCAMS (or
equivalent); the RBA that points to your root is contained in 
it as data. For HDAM files, you can run the key you want through 
the randomizer and get a root anchor point (RAP). When you finally 
get to the block containing the RAP, you get to follow the
synonym chain until you get your root. You are now, of course,
just at the root. You will have to follow pointer chains to 
get the exact segment you want and remember, for each pointer 
you follow in the chain, you must do steps two and three below. 
Fun. Fun.

2) The RBA must then be converted to a physical disk address
   (i.e., the old CCHHRR trick), which can be done, maybe, using 
   a hex calculator and a LISTCAT of the file. Don't forget to 
   determine the relative position within the physical block; 
    you'll need it later.
3) Now use DITTO (or an equivalent) to retrieve the physical
    block and, if you're lucky and you did everything right so 
    far, you can find the segment containing the bad pointer.
4) Now the fun really begins! Determine which of the pointers in
   the segment is the one that's bad. Then you can set it to zero 
   or, if you're feeling particularly brave, point to another 
   segment. Next, hold your breath and replace the block. 

Or, if you're working at the root level, you may want to
change the root that points to the root containing the bad
pointer to point to the root after it. This may work, but 
will cause you grief with your index records, if any. Or, 
you may have to follow a chain or two to determine the
exact RBA of the segment you're after. None of these options 
are ideal; using utilities takes too long for large
files and is ineffective if the problem is not caught immediately
or if you don't have the proper backups and logs. Manual methods 
and one-time programs are time-consuming, prone to error and 
will cause you medical problems: ulcers, stress and possible 
contusions and lacerations (from hitting your head and/or fists 
against inanimate objects). And, in all cases, your data bases
are unavailable to the users while being fixed.

The last method is an online method, whereby you can find a bad
pointer and zap it manually in a few minutes while the rest of 
the data base is available to the user community. Clean. Neat. 
Safe. Effective. DBDS provides you with the ability to do this.
This has been the first installment of an article dealing with
the problem of IMS and/or DL/I data bases that contain broken 
pointers. The second installment will discuss the specific way 
in which DBDS can be used interactively online to find and repair 
data bases with pointer problems.

/*
Was this article of value to you? If so, please let us know by
circling Reader Service No. 00.
For more information on this product, please circle Reader
Service No. ##.