Pointer Pointers Part I: Finding and Repairing Broken DL/I Pointers By David Balch David Balch has 25 years of experience in programming, 19 of which working with CICS and DL/1. He is founder and president of A Few Good People, Inc., a software company based in Crest Park, Calif. that develops and markets several file editing products for the CICS environment. This article examines DBDS from A Few Good People, Inc. This is the first installment of two articles focusing on the problem of IMS and DL/I data bases that contain broken pointers. In this first installment, we examine the problem in general; how it is discovered; what causes the ituation to occur in the first place; and some available options for correcting it. The second installment explores in detail the specific way in which DBDS, a software product from A Few Good People, Inc., can be used interactively online to find and repair data bases with pointer problems. Introduction Pointer problems in DL/I or IMS/DB data bases are a thorn in the side of DBAs everywhere. Unfortunately, there are no simple answers to these complex problems because of the incredible variety of data bases and environments in which they occur. This article is intended to familiarize you, in general terms, with the problem and offer some general approaches to correcting it. It is not intended to be the definitive discussion of the subject. It is assumed that you are already familiar with DL/I terms, concepts, coding and utilities. Please note that for purposes of this document the terms DL/I and IMS/DB will be considered interchangeable. The First Sign of Trouble If you are unfortunate enough to experience a DL/I or IMS pointer problem, you will not need to be Sherlock Holmes to discover it. The problem will rear its ugly head in grand fashion complete with lights and sirens, and usually at the worst possible moment. If it is discovered online, it may be first noticed by you while testing, but this isn't likely since you may actually be able to fix it without anyone noticing or being inconvenienced. Or, it may be first noticed by an operator due to a console message from DL/I, but this is even less likely since such messages are often lost in the mire of messages that bombard the console regularly. More than likely, it will come as a phone call from a user who just got a weird message and wants to know what it means. A typical error message may be in the 700 or 800 range, with text to the effect of: invalid segment code or buffer handler error. For CICS, look for an abend code of ADLA in MVS systems, and then look on the CICS message log for a more specific error message number. In DOS/VSE systems, CICS abends to watch for will be Dnnn (where nnn is the message number in the 700 or 800 range mentioned above). If a particular problem is destined to be discovered in a batch run, you may, if you're lucky, get a phone call in the middle of the night about an abend that occurred. It would be ideal, however, since you could then intervene and stop further processing before other data bases get involved and the problem multiplies. The most likely scenario is that you will get a call from a user that her/his nightly or weekly or whatever job didn't finish. If you're really unlucky, it will happen during a regularly scheduled backup and no one will notice for several weeks, if ever. The unlucky scenario continues with the users not hitting the affected part of the file, and the error goes unnoticed for several weeks or even months. If you have IBM's Space Management Utilities (SMU) and run it on a regular basis, it will find any pointer errors that may be lurking in your data bases. It will even give you the RBA of the segment that contains the error, which will be useful later when you go to fix it. Note that a pointer error will not be found during regular backups if you use image-copy processing since the pointer chains are not followed at that time; pointer chains are, however, followed during unload-type processing (i.e., forward pointer chains). If your data bases are small enough, you might want to consider using unloads for your backups instead of image- copies for that very reason. The good news is that a bad forward pointer will cause the unload to abend and, hopefully, be noticed; the bad news is that if an abend does occur, you won't have a good backup. Worse, if an abend occurs and it isn't noticed, you will not have a good backup until it is noticed and fixed. Definitely not good. Causes and General Approaches Pointer problems occur for a variety of reasons, some known and some not known. There is, unfortunately, no cut-and-dried answer to the question. In some cases, the problem will appear after a dramatic event such as a system crash or a batch application abend. In a situation like that, it is reasonable to associate the problem with the event. If you have designed your backup/recovery procedures properly, you can probably avoid the entire problem by recovering the data base in a manner consistent with the problem that occurred. For example, running CICS emergency restart after a CICS crash will, theoretically, remove partial updates to the data base that may contain bad pointers. If, however, the problem that caused the crash also corrupted your buffers in such a way that they were physically written to the file before the crash, and the corruption occurred unknown to DL/I, emergency restart will not save you because the software does not know to backout those changes. If a batch application abends, you can usually recover using the appropriate DL/I-supplied utilities and remove all updates made during the run, then fix the problem and rerun the job. This is assuming, of course, that you were logging your updates or there was a backup of the file(s) made before the run. A big problem with batch abends is that they can go unnoticed or, even if noticed, no backout is run to recover the files involved. Subsequent jobs run against the affected file(s) can create a pointer problem that can make adults cry. Sometimes a problem can be created in the process of fixing one. There are many war stories of operators who accidentally use an incorrect version of backups or change accumulation files during recovery operations. If several data bases are linked, either logically or physically, and they are restored as of different dates, havoc can result. Finally, the problem may just appear, as if by magic, for no apparent reason. This can happen when a problem occurs and goes unnoticed or unremedied for a long period of time. It is also possible that a system or application bug caused a problem but did not abend and was then fixed either accidentally or on purpose. The pointer problem has thus been created and is just waiting to be discovered. It may not appear for months, making it almost impossible to trace. Why wouldn't it appear immediately? Because application peculiarities may cause the bad pointer to go unused. For example, the broken pointer may be located in the record of an inactive customer whose data is only accessed annually in January. If the problem occurs in February, that bomb will be ticking away for 11 months before it goes off! Or how about a broken pointer from a root to a history segment? There may be no need to access that history for years, in which case you'll probably discover it during a reorganization before the application has a problem. Options for Correcting the Problem There are a number of options available for correcting a broken pointer. We will discuss several options here, including: o use standard recovery utilities; o reorganize the file(s) using unload/reload utilities; o write special application programs; o ZAP the file by calculating physical disk addresses of affected pointers; and o use DBDS to find and zap bad pointers online. Regardless of the option that you choose, it's always a good idea to back up the damaged file before you do anything. Why back up a damaged file? Because it's possible that your recovery attempt may create a worse problem than you started with -- data checks, head crashes, power failures -- you just never know. If the damaged file is backed up, you can at least assure yourself that the current situation is the worst one that you'll have to deal with because if you make the problem worse, you can always restore the backup and start over. If you are aware of a situation that may have caused a problem and you catch it soon enough, you have a pretty good shot at recovering, provided you did your homework and have the appropriate backups and copies of interim changes. The various utilities are too complex to go into here, but suffice it to say that this may be a viable option. The bad news is that the affected files are not available to the user during recovery and that, for large data bases especially, it may take a long, long time to run. The good news is that your recovery will be absolute. You can also completely reorganize the damaged file(s) using unload/reload utilities. Unfortunately, the pointer problem itself may preclude you from unloading the file, thereby eliminating this option completely. If it doesn't, it will resolve your problem without question, provided you do everything right (including reloading indexes and logical relationships). A variation of the batch utility option is the one-time quick and dirty program that unloads or "copies" the data base and intentionally skips the record(s)/segment(s) containing the problem. As we all know, quick-and-dirty programs are never quick and always dirty, but there are other considerations. This is very time-consuming, and must be tested and retested. If you have a very large data base, we could be talking days to accomplish the job. Depending on how you design the solution, you will have to copy control blocks with slightly different names, or determine DL/I's utility unload format so you can use your output directly into DL/I utilities to rebuild the data base. And, heaven forbid, you get almost all the way through the data base with your unload and another pointer problem crops up -- then what? Recompile your program with an additional fix for that part of the data base and start over. This cycle will continue until all of the bad pointers are found: run the program until it abends, determine where it abended, change the program, recompile and run the unload again. Another option is to manually find and ZAP a bad pointer using utilities such as DITTO or an equivalent. (If you have ever done this, reading this section may be too painful a reminder and you will probably want to skip it.) Manually fixing a broken pointer consists, generally, of the following steps: 1) You must first determine the RBA of the segment containing the bad pointer. There are several ways to do this, all of which are extremely aggravating and tedious: o You can usually find an RBA by wading through a dump. o If you can't determine the RBA of the exact segment you want, you have to use whatever RBA you can find and then follow a chain to get to the one you want. In this case, you have to do steps two and three below for each segmentin the chain in order to find the next segment in the chain. o If you know the key of your record, you might try a different approach. For HIDAM files, print the index record using IDCAMS (or equivalent); the RBA that points to your root is contained in it as data. For HDAM files, you can run the key you want through the randomizer and get a root anchor point (RAP). When you finally get to the block containing the RAP, you get to follow the synonym chain until you get your root. You are now, of course, just at the root. You will have to follow pointer chains to get the exact segment you want and remember, for each pointer you follow in the chain, you must do steps two and three below. Fun. Fun. 2) The RBA must then be converted to a physical disk address (i.e., the old CCHHRR trick), which can be done, maybe, using a hex calculator and a LISTCAT of the file. Don't forget to determine the relative position within the physical block; you'll need it later. 3) Now use DITTO (or an equivalent) to retrieve the physical block and, if you're lucky and you did everything right so far, you can find the segment containing the bad pointer. 4) Now the fun really begins! Determine which of the pointers in the segment is the one that's bad. Then you can set it to zero or, if you're feeling particularly brave, point to another segment. Next, hold your breath and replace the block. Or, if you're working at the root level, you may want to change the root that points to the root containing the bad pointer to point to the root after it. This may work, but will cause you grief with your index records, if any. Or, you may have to follow a chain or two to determine the exact RBA of the segment you're after. None of these options are ideal; using utilities takes too long for large files and is ineffective if the problem is not caught immediately or if you don't have the proper backups and logs. Manual methods and one-time programs are time-consuming, prone to error and will cause you medical problems: ulcers, stress and possible contusions and lacerations (from hitting your head and/or fists against inanimate objects). And, in all cases, your data bases are unavailable to the users while being fixed. The last method is an online method, whereby you can find a bad pointer and zap it manually in a few minutes while the rest of the data base is available to the user community. Clean. Neat. Safe. Effective. DBDS provides you with the ability to do this. This has been the first installment of an article dealing with the problem of IMS and/or DL/I data bases that contain broken pointers. The second installment will discuss the specific way in which DBDS can be used interactively online to find and repair data bases with pointer problems. /* Was this article of value to you? If so, please let us know by circling Reader Service No. 00. For more information on this product, please circle Reader Service No. ##.