Guidelines for Creating Correct and Lasting Element Names by Judith R. Reeder Copyright 1991. Judith R. Reeder. All rights reserved. The name of a data element is the first, and often, the only information that users of that data element possess. Whether the user is working in decision support or application development, the accuracy of translating an immediate specification or request to a report or program depends on addressing the data elements based on available information. Data element names are one of data administrations' most important deliverables. Whether naming of elements occurs centrally or in the application development teams, a data element cannot be properly named without an understanding of the role that element plays in the business environment. Without careful consideration from a global perspective, the element may not be given the most appropriate name. Parts of a Data Element Name Some companies have adopted procedures for naming data elements. Other sources are available in current literature. Most approaches expect that a data element name includes a prime word, a class word and optionally one or more modifying words. But these guidelines only define the parts of the name, not how to determine sound and lasting values for the parts. Prospective parents all know that their child will have a first name, a last name and maybe a middle name; they may spend months picking just the right combination. The task of naming a data element is similar and can take an inordinate amount of time. There is a growing emphasis on sharing corporate data among multiple applications. At the same time, decision support and executive information software require a resource to aid in the identification of what data is needed to satisfy requests. The importance of correctly naming a data element the first time escalates as the probability that the data element will be used by multiple applications and/or end-user software increases (1). Prime Words The prime word represents a subject area or high-level grouping of data elements. Common examples of prime words include CUSTOMER, ORGANIZATION, EMPLOYEE and PRODUCT. The number of prime words in the naming standard can be kept in the 50 to 100 range quite easily. Although some guidelines define the prime word as representing an entity, in a normalized modeling effort, there will be many more entities than prime words. Admittedly, a completed data model will in fact have entities for all the prime words mentioned previously. However, there will also be entities such as CUSTOMER ADDRESS, EMPLOYEE JOB HISTORY, PRODUCT INVENTORY, etc. Attempting to assign a different prime word to each entity results in many prime words without adding to the value of the information contained in the name. On the contrary, one prime word per entity will cause name changes when an element migrates to another entity based on changes in the business environment, the scope of the analysis effort or increased understanding by the analyst. Class Words Class words require that the analyst understand the business definition of the element. Data elements naturally fall into one of several subsets of elements. The highest level grouping sorts elements by data type: numeric, alpha and dates. Date elements are not further grouped; numeric and alpha elements are. Because the element name is also a communication device, assigning the correct class word and enforcing the appropriate data type for the class word are essential. Many experienced IS professionals remember when only elements used in a calculation were given numeric data types. This technique leads to data elements that are given the appropriate class word but are implemented in the wrong data type. For the decision support or application development user, remembering which elements with numeric class words need to be treated as character strings, or which elements with alpha class words are implemented with numeric data types negatively impacts productivity and escalates frustration. In addition, most data base managers can enforce some data types, thus relieving edit routines from checking dates and many numeric data elements. Generic class words that apply to most corporate data element lists include IDENTIFIER, NAME, NUMBER, AMOUNT, QUANTITY, CODE, TEXT and DATE. Class words should be documented clearly to help the naming effort and the users as they deduce information from the element's name. Definitions for these class words include implied data type and the need for domain definitions (CODE elements need documentation of values and meanings, NAME elements are alphanumeric free-form). Some naming standards use the IDENTIFIER class word for any element that is a primary key. Additional class words depend on the business and/or the application. An application dealing with warehousing may need HEIGHT, WEIGHT, LENGTH and WIDTH for class words. Other class words that are not found in most naming standards include TIME, PHONE, COUNT and EXTERNAL KEY. FLAG as a class word is becoming less popular with data administrators. The use of a FLAG presumes only two possible values for the data element, usually a "yes" and "no". Based on the usual definition, a CUSTOMER KEY ACCOUNT FLAG could only have values of "Y" and "N". The default for a new customer would be N, when in reality, the value may be not known when a new customer is identified. With only two possible values, there is no way to distinguish "N" meaning "No" from "N" meaning "Don't Know". The immediate solution is for the FLAG to take on three values: Y, N and NULL. Now the distinction can be made between the unknowns and the known "No" values. However, that only solves part of the problem. The element is still limited in the business meaning it can assume over time. It is not impossible that any given business will identify different types of key accounts for its customers. If the element is named CUSTOMER KEY ACCOUNT CODE, the number of valid values can be expanded over time as the business changes. Modifying Words Once the prime word and the class word for the data element have been chosen, the analyst must decide whether any vital business information is still missing from the name. The business information imparted from data element names CUSTOMER NUMBER and CUSTOMER PHONE NUMBER is significantly different. Modifying words are used to further describe the business element represented. Maintaining the modifying word list entails organization skills, familiarity with the corporate pool of data elements and a good thesaurus. One of the most common shortcomings of modifying word lists is the number of synonyms they contain. If a modifying word list contains the words PRODUCT and PART, and the two words are used interchangeably by the business, one analyst might name a data element PRODUCT NAME and another PART NAME. When the data models are being integrated, because the same business concept has two different names, the consolidation process is unnecessarily complex. When the request is made for PART as a modifying word, the analyst checks to see if there is already a modifying word that has the same business meaning. If PRODUCT is already on the list, the request for PART is denied. This results in a much higher probability that two data analysts will name the same business concept the same, in this case, PRODUCT NAME. Getting the Name Right: Scope Ideally, the analyst has the opportunity to become familiar with the entire corporation's business needs before embarking on the task of naming data elements. Realistically, the initial scope is the first application worked on. Because the scope is limited, the concept of subject areas will be limited. As an example, assume a marketing application for a pharmaceutical company. One defensible prime word is PHYSICIAN. The entities in the PHYSICIAN subject area include PHYSICIAN, PHYSICIAN CALL, PHYSICIAN ADDRESS and PHYSICIAN SPECIALTY. Data element names describing the physicians include the prime word PHYSICIAN. Subsequently, another marketing application targets many types of professionals, including pharmacists, researchers, nurses, as well as physicians. Use of the PHYSICIAN prime word for many of the data elements is now too restrictive. The analyst adds the prime word PROFESSIONAL. The PHYSICIAN data elements that describe all professionals are changed from PHYSICIAN_% to PROFESSIONAL_%. Only those data elements that are specific to physicians, as opposed to other medical professionals, retain the prime word PHYSICIAN. When a logical data element receives a name change, the ideal physical column/field name also changes to reflect the current name of the logical element. The earlier applications pay the penalty of being first when the scope of corporate understanding in the data administration area is still small. Over time, they find that many of the elements named and documented during their development effort had an unavoidable application bias, which results in a mismatch between their column names and the current element name. Future applications and ad hoc access through views should use the more correct name. However, it is impractical for the existing applications to change the names of columns in the data bases and the programs that read the data bases. Therefore, part of the documentation for the logical data element includes a cross-reference to each physical representation of the element. Data Values One of the most common mistakes in data element naming is to include data values as part of the data name. The goal of business element naming is to achieve a stable and flexible data model. Failing to focus on the business concepts separately from the values that the concepts may assume results in data names and data structures that are not flexible, and therefore are less likely to withstand changes to the business. A group of data elements such as CUSTOMER JANUARY PAYMENT AMOUNT, CUSTOMER FEBRUARY PAYMENT AMOUNT, ... CUSTOMER DECEMBER PAYMENT AMOUNT, all represent the same business concept of the amount of money a customer paid. The modifying words JANUARY ... DECEMBER refer to the timing of the payments, but the business probably does not treat a January payment any different than a December payment. By using two data elements, CUSTOMER PAYMENT AMOUNT and CUSTOMER PAYMENT DATE, the elements names map to the business concepts and there is flexibility to change the way payments are recorded (to bimonthly) or maintained (two years of information). Other examples of data values hiding in element names are MAIL in %_MAIL_CITY_NAME, PREVIOUS in %_PREVIOUS_BALANCE, ONHAND in %_ONHAND_QUANTITY and PRIMARY IN %_PRIMARY_SPECIALTY. In each of these examples, taking the data value out of the data name means that we add another data element (often a date or a type) to hold the value. In this way, we get an address type equal to mail and the city associated with that type, or a balance type equal to previous and the balance amount associated with it. (Or, better yet, a balance date and balance amount.) The flexibility of this approach means new types can be added by increasing the allowable values of one data element instead of having to add a new data element to the data model and then to the data base design. Atomicity There are times when the data element name represents more than one business concept but has not included data values as part of the name. For example, a user has a business concept of the daily dosage for medication of a patient. The analyst quickly separates the concept of frequency from the dosage and begins with two data elements; PATIENT DOSAGE FREQUENCY and PATIENT DOSAGE. Values for frequency probably include hourly, with meals and weekly as well as daily. Until the analyst begins to document allowable values for PATIENT DOSAGE, she/he may not discover that two data elements are still not sufficient. Values for the PATIENT DOSAGE include 250mg, 3ml 10 percent I.V. and 2 100mg tablets. Possible data elements to replace PATIENT DOSAGE are PATIENT DOSAGE QUANTITY, PATIENT DOSAGE MEDIUM, PATIENT DOSAGE STRENGTH and PATIENT DOSAGE UNIT OF MEASURE. Now the pieces of the values of PATIENT DOSAGE are assigned to the appropriate new data elements. See Figure 1. Although the original application might have been satisfied with the PATIENT DOSAGE element, subsequent applications or users of the original application may need to access the more specific pieces of information. Matching each business concept to its own data element increases the usability of the pieces as well as the information imparted by values of several elements. Occasionally, a data element may be defined at too low a level. A date element may appear as three: day, month and year. With current data base functionality, it is easier to enforce the entire date than the three separate pieces. The individual parts can be extracted from the one element as needed. Similar capabilities are available for times. Code Elements Once the element has finally been named, if it has been assigned a code class word, the definition of all values and meanings is important. Code elements have a habit of being overused to the point where the values may not be mutually exclusive. An element named EMPLOYEE TYPE CODE initially has values/meanings of: A = Active, R = Retired, D = Deceased, L = On temporary leave. Eventually, a second application wishes to use the element with a different value/meaning set; F = Full time, P = Part time, S = Summer student. If the value/meaning sets are not documented and referenced by the new application, the element is overloaded and more than one value may apply for any individual employee. Another problem with codes, especially status codes, is the need to document the rules for value transitions. An element named ORDER STATUS CODE has values of P = Potential, O = Open, A = Approved, S = Shipped, I = Invoiced, C = Canceled. Valid values for a new order occurrence are probably only P and O. A value of P can only be updated to C or O, a value of O to A or C, a value of A to S or C, a value of S only to I. Drawing a state transition diagram for these codes may he helpful in uncovering the appropriate changes for the element's value. Foreign Key Names Once an element has been named, it should keep the same name in all entities and applications. This means that all foreign key elements retain the primary key name. Since the element refers to the same business concept whether it is a primary key or foreign key, it makes business sense to keep the same name. In addition, it aids the user in identifying the foreign keys (and therefore the relationships) in the data base. The exception occurs when there are multiple relationships between two entities, resulting in multiple foreign keys. Because each attribute in an entity must be unique, the foreign key name is expanded to include a modifying word, or suffix, that describes the relationship role. Conclusion Naming data elements should not be attempted in a vacuum. Adopting and disseminating naming standards and word lists (class words, prime words and modifying words) should precede the earliest attempts to name data elements. The word lists should be re-circulated frequently so the analysts have the most current information at hand. In addition, the documentation of the element names and documentation should be held centrally in a CASE tool or data dictionary. Reports from the growing pool of elements should be published to all analysts on a regular basis. Being aware of all existing element names and definitions saves the analysts from agonizing over the name of an element that has already been named and defined by another application. (1) Reeder, Judith R., Don't Take Names in Vain, Database Programming and Design, July, 1991, pp. 17-20. Guidelines for creating correct and lasting element names. 2611 Was this article of value to you? If so, please let us know by circling Reader Service No. 00. Figure 1: Quantity Medium Strength Unit of Measure 250 mg 1 tablets 250 mg 3 ml 3 injectable 10% ml 2 100 mg tablets 2 tablets 100 mg