Guidelines for Creating Correct and Lasting Element Names
by Judith R. Reeder

Copyright 1991. Judith R. Reeder. All rights reserved.

The name of a data element is the first, and often, the only information that
users of that data element possess. Whether the user is working in decision
support or application development, the accuracy of translating an immediate
specification or request to a report or program depends on addressing the data
elements based on available information.

Data element names are one of data administrations' most important deliverables.
Whether naming of elements occurs centrally or in the application development
teams, a data element cannot be properly named without an understanding of the
role that element plays in the business environment. Without careful
consideration from a global perspective, the element may not be given the most
appropriate name.

Parts of a Data Element Name

Some companies have adopted procedures for naming data elements. Other sources
are available in current literature. Most approaches expect that a data element
name includes a prime word, a class word and optionally one or more modifying
words. But these guidelines only define the parts of the name, not how to
determine sound and lasting values for the parts. Prospective parents all know
that their child will have a first name, a last name and maybe a middle name;
they may spend months picking just the right combination. The task of naming a
data element is similar and can take an inordinate amount of time.

There is a growing emphasis on sharing corporate data among multiple
applications. At the same time, decision support and executive information
software require a resource to aid in the identification of what data is needed
to satisfy requests. The importance of correctly naming a data element the first
time escalates as the probability that the data element will be used by multiple
applications and/or end-user software increases (1).

Prime Words

The prime word represents a subject area or high-level grouping of data elements.
Common examples of prime words include CUSTOMER, ORGANIZATION, EMPLOYEE and
PRODUCT. The number of prime words in the naming standard can be kept in the 50
to 100 range quite easily. Although some guidelines define the prime word as
representing an entity, in a normalized modeling effort, there will be many more
entities than prime words. Admittedly, a completed data model will in fact have
entities for all the prime words mentioned previously. However, there will also
be entities such as CUSTOMER ADDRESS, EMPLOYEE JOB HISTORY, PRODUCT INVENTORY,
etc.

Attempting to assign a different prime word to each entity results in many prime
words without adding to the value of the information contained in the name. On
the contrary, one prime word per entity will cause name changes when an element
migrates to another entity based on changes in the business environment, the
scope of the analysis effort or increased understanding by the analyst.

Class Words

Class words require that the analyst understand the business definition of the
element. Data elements naturally fall into one of several subsets of elements.
The highest level grouping sorts elements by data type: numeric, alpha and dates.
Date elements are not further grouped; numeric and alpha elements are.

Because the element name is also a communication device, assigning the correct
class word and enforcing the appropriate data type for the class word are
essential. Many experienced IS professionals remember when only elements used in
a calculation were given numeric data types. This technique leads to data
elements that are given the appropriate class word but are implemented in the
wrong data type.

For the decision support or application development user, remembering which
elements with numeric class words need to be treated as character strings, or
which elements with alpha class words are implemented with numeric data types
negatively impacts productivity and escalates frustration. In addition, most data
base managers can enforce some data types, thus relieving edit routines from
checking dates and many numeric data elements.

Generic class words that apply to most corporate data element lists include
IDENTIFIER, NAME, NUMBER, AMOUNT, QUANTITY, CODE, TEXT and DATE. Class words
should be documented clearly to help the naming effort and the users as they
deduce information from the element's name. Definitions for these class words
include implied data type and the need for domain definitions (CODE elements need
documentation of values and meanings, NAME elements are alphanumeric free-form).
Some naming standards use the IDENTIFIER class word for any element that is a
primary key.

Additional class words depend on the business and/or the application. An
application dealing with warehousing may need HEIGHT, WEIGHT, LENGTH and WIDTH
for class words. Other class words that are not found in most naming standards
include TIME, PHONE, COUNT and EXTERNAL KEY.

FLAG as a class word is becoming less popular with data administrators. The use
of a FLAG presumes only two possible values for the data element, usually a "yes"
and "no". Based on the usual definition, a CUSTOMER KEY ACCOUNT FLAG could only
have values of "Y" and "N". The default for a new customer would be N, when in
reality, the value may be not known when a new customer is identified. With only
two possible values, there is no way to distinguish "N" meaning "No" from "N"
meaning "Don't Know".

The immediate solution is for the FLAG to take on three values: Y, N and NULL.
Now the distinction can be made between the unknowns and the known "No" values.
However, that only solves part of the problem. The element is still limited in
the business meaning it can assume over time. It is not impossible that any given
business will identify different types of key accounts for its customers. If the
element is named CUSTOMER KEY ACCOUNT CODE, the number of valid values can be
expanded over time as the business changes.

Modifying Words

Once the prime word and the class word for the data element have been chosen, the
analyst must decide whether any vital business information is still missing from
the name. The business information imparted from data element names CUSTOMER
NUMBER and CUSTOMER PHONE NUMBER is significantly different. Modifying words are
used to further describe the business element represented.

Maintaining the modifying word list entails organization skills, familiarity with
the corporate pool of data elements and a good thesaurus. One of the most common
shortcomings of modifying word lists is the number of synonyms they contain. If
a modifying word list contains the words PRODUCT and PART, and the two words are
used interchangeably by the business, one analyst might name a data element
PRODUCT NAME and another PART NAME.

When the data models are being integrated, because the same business concept has
two different names, the consolidation process is unnecessarily complex. When the
request is made for PART as a modifying word, the analyst checks to see if there
is already a modifying word that has the same business meaning. If PRODUCT is
already on the list, the request for PART is denied. This results in a much
higher probability that two data analysts will name the same business concept the
same, in this case, PRODUCT NAME.

Getting the Name Right: Scope 

Ideally, the analyst has the opportunity to become familiar with the entire
corporation's business needs before embarking on the task of naming data
elements. Realistically, the initial scope is the first application worked on.
Because the scope is limited, the concept of subject areas will be limited.

As an example, assume a marketing application for a pharmaceutical company. One
defensible prime word is PHYSICIAN. The entities in the PHYSICIAN subject area
include PHYSICIAN, PHYSICIAN CALL, PHYSICIAN ADDRESS and PHYSICIAN SPECIALTY.
Data element names describing the physicians include the prime word PHYSICIAN.

Subsequently, another marketing application targets many types of professionals,
including pharmacists, researchers, nurses, as well as physicians. Use of the
PHYSICIAN prime word for many of the data elements is now too restrictive. The
analyst adds the prime word PROFESSIONAL. The PHYSICIAN data elements that
describe all professionals are changed from PHYSICIAN_% to PROFESSIONAL_%. Only
those data elements that are specific to physicians, as opposed to other medical
professionals, retain the prime word PHYSICIAN.

When a logical data element receives a name change, the ideal physical
column/field name also changes to reflect the current name of the logical
element. The earlier applications pay the penalty of being first when the scope
of corporate understanding in the data administration area is still small. Over
time, they find that many of the elements named and documented during their
development effort had an unavoidable application bias, which results in a
mismatch between their column names and the current element name.

Future applications and ad hoc access through views should use the more correct
name. However, it is impractical for the existing applications to change the
names of columns in the data bases and the programs that read the data bases.
Therefore, part of the documentation for the logical data element includes a
cross-reference to each physical representation of the element.

Data Values

One of the most common mistakes in data element naming is to include data values
as part of the data name. The goal of business element naming is to achieve a
stable and flexible data model. Failing to focus on the business concepts
separately from the values that the concepts may assume results in data names and
data structures that are not flexible, and therefore are less likely to withstand
changes to the business.

A group of data elements such as CUSTOMER JANUARY PAYMENT AMOUNT, CUSTOMER
FEBRUARY PAYMENT AMOUNT, ... CUSTOMER DECEMBER PAYMENT AMOUNT, all represent the
same business concept of the amount of money a customer paid. The modifying words
JANUARY ... DECEMBER refer to the timing of the payments, but the business
probably does not treat a January payment any different than a December payment.
By using two data elements, CUSTOMER PAYMENT AMOUNT and CUSTOMER PAYMENT DATE,
the elements names map to the business concepts and there is flexibility to
change the way payments are recorded (to bimonthly) or maintained (two years of
information).

Other examples of data values hiding in element names are MAIL in
%_MAIL_CITY_NAME, PREVIOUS in %_PREVIOUS_BALANCE, ONHAND in %_ONHAND_QUANTITY and
PRIMARY IN %_PRIMARY_SPECIALTY. In each of these examples, taking the data value
out of the data name means that we add another data element (often a date or a
type) to hold the value. In this way, we get an address type equal to mail and
the city associated with that type, or a balance type equal to previous and the
balance amount associated with it. (Or, better yet, a balance date and balance
amount.)

The flexibility of this approach means new types can be added by increasing the
allowable values of one data element instead of having to add a new data element
to the data model and then to the data base design.

Atomicity

There are times when the data element name represents more than one business
concept but has not included data values as part of the name. For example, a user
has a business concept of the daily dosage for medication of a patient. The
analyst quickly separates the concept of frequency from the dosage and begins
with two data elements; PATIENT DOSAGE FREQUENCY and PATIENT DOSAGE. Values for
frequency probably include hourly, with meals and weekly as well as daily.

Until the analyst begins to document allowable values for PATIENT DOSAGE, she/he
may not discover that two data elements are still not sufficient. Values for the
PATIENT DOSAGE include 250mg, 3ml 10 percent I.V. and 2 100mg tablets. Possible
data elements to replace PATIENT DOSAGE are PATIENT DOSAGE QUANTITY, PATIENT
DOSAGE MEDIUM, PATIENT DOSAGE STRENGTH and PATIENT DOSAGE UNIT OF MEASURE. Now
the pieces of the values of PATIENT DOSAGE are assigned to the appropriate new
data elements. See Figure 1.

Although the original application might have been satisfied with the PATIENT
DOSAGE element, subsequent applications or users of the original application may
need to access the more specific pieces of information. Matching each business
concept to its own data element increases the usability of the pieces as well as
the information imparted by values of several elements.

Occasionally, a data element may be defined at too low a level. A date element
may appear as three: day, month and year. With current data base functionality,
it is easier to enforce the entire date than the three separate pieces. The
individual parts can be extracted from the one element as needed. Similar
capabilities are available for times.

Code Elements

Once the element has finally been named, if it has been assigned a code class
word, the definition of all values and meanings is important. Code elements have
a habit of being overused to the point where the values may not be mutually
exclusive.

An element named EMPLOYEE TYPE CODE initially has values/meanings of: A = Active,
R = Retired, D = Deceased, L = On temporary leave. Eventually, a second
application wishes to use the element with a different value/meaning set; F =
Full time, P = Part time, S = Summer student. If the value/meaning sets are not
documented and referenced by the new application, the element is overloaded and
more than one value may apply for any individual employee.

Another problem with codes, especially status codes, is the need to document the
rules for value transitions. An element named ORDER STATUS CODE has values of P
= Potential, O = Open, A = Approved, S = Shipped, I = Invoiced, C = Canceled.
Valid values for a new order occurrence are probably only P and O. A value of P
can only be updated to C or O, a value of O to A or C, a value of A to S or C,
a value of S only to I. Drawing a state transition diagram for these codes may
he helpful in uncovering the appropriate changes for the element's value.

Foreign Key Names

Once an element has been named, it should keep the same name in all entities and
applications. This means that all foreign key elements retain the primary key
name. Since the element refers to the same business concept whether it is a
primary key or foreign key, it makes business sense to keep the same name. In
addition, it aids the user in identifying the foreign keys (and therefore the
relationships) in the data base. The exception occurs when there are multiple
relationships between two entities, resulting in multiple foreign keys. Because
each attribute in an entity must be unique, the foreign key name is expanded to
include a modifying word, or suffix, that describes the relationship role.

Conclusion

Naming data elements should not be attempted in a vacuum. Adopting and
disseminating naming standards and word lists (class words, prime words and
modifying words) should precede the earliest attempts to name data elements. The
word lists should be re-circulated frequently so the analysts have the most
current information at hand.

In addition, the documentation of the element names and documentation should be
held centrally in a CASE tool or data dictionary. Reports from the growing pool
of elements should be published to all analysts on a regular basis. Being aware
of all existing element names and definitions saves the analysts from agonizing
over the name of an element that has already been named and defined by another
application.

(1) Reeder, Judith R., Don't Take Names in Vain, Database Programming and Design,
July, 1991, pp. 17-20. Guidelines for creating correct and lasting element names.

2611

Was this article of value to you? If so, please let us know by circling Reader
Service No. 00.
Figure 1:

                         Quantity     Medium      Strength     Unit of Measure

250 mg                           1         tablets        250             mg
3 ml                        3         injectable      10%            ml
2 100 mg tablets            2         tablets        100             mg
