Q-Built Solutions Makes Quick, Quality-Built Software For Your Business Needs!
 

RDB Design Boot Camp

Place your ad here! This domain gets 400 visitors per day and has Google Page Rank of 3 or 4.  Get great ad rates!

Place your ad here! This domain gets 400 visitors per day and has Google Page Rank of 3 or 4.  Get great ad rates!


 
Please help us write more free software and tips for this site by donating.
Please donate today!


Q-Built Solutions Web Statistics
.

 
Statistics As Of:
26 Mar. '07

Number of unique visitors since 20 March '04:
204,793

Number of Web pages served since 20 March '04:
429,683

Our Most Popular Web Pages:

 1.  How-To Tips
 2. 
Technical Articles
 3. 
Gem Tips
 4. 
VBA
 5. 
Free Stuff
 6. 
Links
 7. 
Free Downloads
 8. 
Our Custom Microsoft Access Products
 9.
Forms
10. 
FAQ's
 

 

 

 

 

 

 

Relational Database Design

Boot Camp

Training Day 2 (TD2):  First Normal Form

Welcome to Training Day 2 of Relational Database Design Boot Camp. I am the Gunny. Gunny is short for Gunnery Sergeant, United States Marine Corps. That means you'll always get straight, no-nonsense information, without wasting your time or mine.  The Gunny will always give you The Word.  If you have not completed Training Day 1, then put on your black cadillacs most ricky tick and get back to Training Day 1.

This lesson will be quick and painless, like all Marine Corps training.  First, I'll start with your objectives for this lesson.
 

Lesson Objectives

Read lesson objectives here.


Don't Throw Everything Into The Same Barrel (or "Why You Shouldn't Store Multiple Values Or Multiple Types In The Same Column")

Gunnery sergeants in the U.S. Marine Corps are responsible for many things and many people. I had two Marines who managed our S-4 shop. We'll call them Private Justin Case and Private Ebe Schmuckatelli. (Not their real names, but then this isn't a real example, either. I was reminded that people fond of gun control might read this article and be offended by the mention of weapons. I argued that M-16's are "defensive weapons," and are used to defend the U.S., U.S. interests, and the lives of U.S. military personnel.  However, by the looks I received, I believe it's time to abide by one of the Marine Corps' unofficial mottos:  improvise, adapt and overcome.  So I'll improvise.  I'll use pogie bait, instead of weapons, ammo and things that go "boom!" It just won't be as exciting as my first draft.  Marines love to watch things explode.)

Privates Case and Schmuckatelli received new shipments of pogie bait:  peanut butter cookies and jelly beans -- the gourmet kind, with numerous flavors and colors.  (Hey, improvising doesn't mean my example has to be purchased from the lowest bidder.)  The only place to store the cookies and the jelly beans in airtight containers was in several barrels, stacked in the back of the supply room, next to a partly full barrel of cookies and a partly full barrel of jelly beans.

The boxes of cookies and the boxes of jelly beans wouldn't quite fit into the barrels, so Pvt Schmuckatelli lined two of the barrels with plastic bags and dumped cookies and jelly beans from the boxes into the barrels until those two barrels were full and the original, partly full, barrels of cookies and jelly beans were nearly full.  In the meantime, Pvt Case poured jelly beans of each flavor into separate five pound plastic bags, tagged each bag to identify what was inside and threw each bag into a barrel until two more barrels were full.  It took Pvt Case almost ten more minutes to fill his two barrels than it took Pvt Schmuckatelli to fill the other four.

I walked in and said "We need to deliver 30 pounds of cinnamon-flavored jelly beans to the main base.  How many pounds do we have?"  Pvt Case quickly dug four sacks of jelly beans from one of the barrels and weighed them on the scale. "18 pounds, 4 ounces -- plus whatever is in the other barrels."

Oops.

Pvt Schmuckatelli then told me what he had done.

"We'll have to fix that ASAP.  For now, just take the barrel of cookies and put it on the truck.  Double time!  The truck is leaving for the chow hall right now," I directed them.

The way he looked at me, I could see that Pvt Schmuckatelli hadn't quite told me everything.

"There's only one flavor of cookies. They're all peanut butter! You couldn't possibly have mixed . . . don't tell me that they're not all in one barrel!" I exclaimed.

"Gunny, they wouldn't all fit into one barrel. Not after I poured all those jelly beans in the bottom," Pvt Schmuckatelli told me.

In the military, there's never enough time to do it right the first time, but there's always enough time to do it over.  Pvt Schmuckatelli and Pvt Case spent the next two hours manually sorting and bagging the cookies and jelly beans in the other four barrels.  When they were finished, we could easily separate the cookie barrels from the jelly bean barrels and identify how many pounds of cinnamon-flavored jelly beans we had, as well as the amounts of any other individual flavors of jelly beans or peanut butter cookies, whenever we needed to.

Pvt Schmuckatelli had followed Gunny's Rule #4 for storing data (he saved almost ten minutes and did almost twice as much work), but at the expense of Gunny's Rule #2 for retrieving data.  It was difficult to identify how many pounds of each flavor we had, because each flavor was not stored in its own bag (or column in a table if each amount had been written as a record in a database). The cookies were mixed in with the jelly beans, so when it came time to grab just the cookies, we couldn't. We had to separate the different types first, because both had been stored in the same barrel (or column in a table if each amount of each type of food had been written as a record).

It took a lot of time to retrieve the items and the information that we needed, because of the inefficiency in the way that some of it was stored.  This wasted the Gunny's time and violates Gunny's Rules.

Don't violate Gunny's Rules. Don't be Pvt Schmuckatelli.


Example With A Database

Another example of "throwing everything into the same barrel" is trying to record multiple sizes of garment supply purchases in the same column, such as "S, M, L, XL." Data entry would be a breeze, because all of these sizes are easily typed into the "Size" column for a single record and the total number purchased would be typed in the "Quantity" column.  Right?

We would have faster data entry, but at the expense of building and running queries (retrieving the data), which would take more time than they need to. Not only does it violate Gunny's Rule #2, "Make retrieving data as quick and as easy as possible," it also violates Gunny's Guidelines "Never store your data in the same arrangement that you intend to display it" and "Always design for exceptions."

What exceptions do we need to design for?  I can think of a few. What if some types of baseball caps only come in medium and large, while others only come in large? In that case, some of the strings contained in the "Size" column will be shorter than the maximum length where all sizes are available. ("M,L" is shorter than "S,M,L,XL.")  Relational databases use queries to retrieve and display data stored in tables.  It will be difficult to write a query to sort out what's in each record by parsing the string value that may or may not have an "L" at the end of the string and may or may not have commas or even spaces in the string.

But most importantly, it will be difficult to find matches in a query if we design our table this way.  "S, M, L, XL" is not the same as "S, M, L XL" so we would have to design our query to account for all cases that may be mistyped by users -- in addition to the actual value that we want. Otherwise, some of the records that should have been selected for the query will never be displayed. Also, if we wanted to determine which baseball caps only came in medium and large sizes, we would have a hard time parsing the string value in the Size column to retrieve the records that contained only "M" and "L" but no other sizes.  Our query would likely end up with extra records displayed, because so many other records contain "M" and "L" sizes -- and other sizes, too. Obviously, a query that shows these extra records with the other sizes is wrong, but we may not be able to produce a query that is correct when there are so many variations in the way the data is stored.

So why don't we just create a column in the table for each size and place the quantity for each size in each of these columns? For example, we could name the columns "S_Size," "M_Size," "L_Size," and "XL_Size." This "repeated grouping" of sizes partly solves the problem, because we no longer have to hunt for each value that might be in various arrangements in a single column.  However, now our query would have to check four different columns instead of just one column to find how many of which sizes were purchased, which violates Gunny's Rule #2.  And if we ever need to add more sizes, such as XS and XXL, then we'd need to alter the table by adding these two columns, and we'd have to alter all queries that used this table's columns to determine sizes, which violates Gunny's Rules #2, #3, and #4.

So, if we put multiple values in the same column, then we either get too few records or too many records in queries because it's so hard to match the arrangement of the multiple values in this column "perfectly." A relational table design that results in incorrect data, such as queries that produce the wrong set of records, or makes it difficult or impossible to retrieve data, is called an anomoly.

The solution to this anomoly is to place a single value, say "S" in the "Size" column for one record, an "M" in the "Size" column for the next record, an "L" in the "Size" column for the next record, and so on, until we have a record for each size of the garment we have purchased and the appropriate quantity of that size in the "Quantity" column of each record.  In this case, that would be four separate records for the four sizes of "S," "M," "L" and "XL" of a single garment.  We will be able to quickly determine how many of any size garments were purchased, as well as any other attributes that were stored in the table, such as the price we paid and the date we purchased each one.

If we design our table this way, then it will also be more flexible than putting all sizes in the same column, because it allows us to quickly and easily answer questions that we didn't even know that we were going to ask.

For example, when we suspect that the price has gone up for large and extra large baseball caps, we can determine exactly what prices we paid in the past for each large cap and each extra large cap and compare these with what we paid on the most recent invoice.


Anomoly And Solution

So, to summarize, a relational table design that results in incorrect data, such as queries that produce the wrong set of records, or makes it either difficult or impossible to retrieve data, is called an anomoly.

We've identified an anomoly caused by multiple values or multiple types stored in the same column:

    If we put multiple values in the same column (throw everything into the barrel), then we'll have a hard time retrieving individual bits of information, like quickly counting how many of an item we have, instead of just the total, or aggregate, (as in "We have four barrels full of jelly beans, but how many pounds of those jelly beans are cinnamon-flavored?") stored in the table.  We'll also have a hard time writing queries that find the records that have individual information we're looking for (as in "Has the price for small and medium baseball caps gone up as much as the price for large baseball caps, or have these prices stayed the same?")

    If we put multiple types in the same column (like cookies and jelly beans in the same barrel), then we'll have difficulty separating the different types (as in one type needed to go to the chow hall and the other type needed to go to the main base, but the two types were mixed in each barrel, so it took extra time to separate them).

We've found that the way to fix this anomoly is to create a separate record for each of these values, instead of placing all of the multiple values in a single column in one record. We also get more flexibility from this design to answer questions we haven't even thought of yet (as in "Did we get overcharged for the large red baseball caps this month?").

So you understand this anomoly and the solution. Correct? Then get ready for the next part, because we're almost done with this lesson.


The Home Stretch

The academics with PhD's who invented relational database theory have defined a number of anomolies that relational databases can encounter and have defined the techniques to fix each type of anomoly.  Relations are classified according to the types of anomolies that they are vulnerable to.  These classes of relations, and the techniques used to fix the anomoly each class of relation is vulnerable to, are called normal forms.  There are several normal forms, and we'll cover two more of them in the next lessons.

The definition of first normal form is:

"The cells of the table must be of single value, and neither repeating groups nor arrays are allowed as values.  All entries in any column (attribute) must be of the same kind. Each column must have a unique name, but the order of the columns is insignificant.  Finally, no two rows in a table may be identical, and the order of the rows is insignificant."  D. M. Kroenke, Database Processing: Fundamentals, Design & Implementation, 7th Ed.

(Does this definition sound familiar?  We used these restrictions for defining a table as a relation in TD1.)  When a database meets all of the criteria in the definition of first normal form, then the database is said to be normalized to first normal form or the database is "in first normal form." Removing the multiple values and the multiple types from the columns in the examples above -- and replacing them with a single value of one type in each column -- helped put the table into first normal form.

Explaining Gunny's Guidelines:

Gunny's Guideline #3:  Why do we always design for exceptions? Databases that are built without flexibility to accommodate exceptions will be constantly redesigned whenever new functionalities are needed.  Redesigning tables means that queries are usually redesigned, too.  Any application that relies on these tables and queries to remain exactly as they were originally will have to be altered, too, in order to accommodate each new redesign.  The bigger and more complex the database and the applications that use it, the more time it will take to redesign them, which is time that many businesses and organizations can ill-afford.

If a database design is flexible, then expanding the functionality to accommodate questions that weren't even thought of at the time the database was developed can be accomplished by adding a new query or merely altering the criteria of existing queries.


What You Just Learned

You've learned that incorrect table designs make the data stored within them vulnerable to problems, and each of these types of problems is classified as an anomoly. You've learned that the table design techniques used to avoid (or solve) these anomolies are called normal forms.

You've learned how to prevent the anomoly caused by tables which are not in first normal form. You've learned that storing multiple values or multiple types in the same column make it either difficult or impossible to write a query that produces the correct data set.  And you've learned that designing a database to be flexible enough for future enhancements will save time when those enhancements are eventually needed.

 

That's it for today's lesson.

 

You can read about your favorite expert contributors.

Copyright © 2005 - 2006 Q-Built Solutions. All rights reserved.

 Top

Disclaimer:  This article is not an official curriculum of the U.S. Marine Corps or any military service.  The information contained in this article is solely the work of the author and is intended to provide readers with instructions on how to build a relational database with the proper techniques to avoid common problems that plague poor database design.

 

Visitors since 26 Nov. '05: 

 

[MS Access] [Free Stuff] [Articles] [Gem Tips] [How-To Tips] [Links] [Products] [Scorecard] [About Us] [Search]

Sign up for PayPal and start accepting credit card payments instantly.