Skip to main content

Database Normalization Basics

There are 3 primary reasons why a database should be normalized:
  1. It brings data integrity to the tables (entities). This facilitates proper inserting, updating, and deleting of data thereby preventing data anomalies. If a table is not normalized it runs the risk of inconsistent data, data redundancies, and inconsistent result sets.  
  2. To create entity relationships thereby providing referential integrity. Referential integrity ensures that relationships between tables remain consistent when entities are altered.
  3. To avoid having one set of users with a biased view of the data. For example a marketing department may view the data in line with their revenue needs rather than having an objective view of the data as it applies to the entire organization.
We begin by analyzing the current relationships that exist between the attributes within each entity.
Apparent in the Orders entity is the repeating data in the ITEM_NAME, ITEM_DESCRIPT, QUANTITY, and PRICE attributes. We break the entity out to have no repeating data in any attribute while leaving the structure of the entity the same.


















Each attribute now has a single value and there are now no repeating groups, or multiple data entries in any attribute.

Convert the un-normalized Orders entity to the First Normal Form.

We will need to have a combination of ORDER_ID and CUST_ID to uniquely identify the row. These will be our primary keys. We break down the Orders entity into a new Order entity, by extracting the information related to the product that was ordered. We now have 2 entities; Order, and Orders. We are now in 1st normal form. (1NF) where all key attributes are identified and there are no repeating groups. Our goal it to heave every entity act as a Noun, with the Noun's attributes only.



















Convert  to the Second Normal Form and provided the entity name, column names, and appropriate key types.


We further break out the Orders entity by creating an Items entity. Here we effectively remove the products being sold from the actual sales order information.
















Obvious at this point the Items entity can now be updated at any time in the future without disturbing the entity relationships. For example, we could add a SKU Number attribute to the Items entity, or an End Of Life product designation.

Orders entity is now cleaned up once the product attributes were removed and moved to the Items entity.













Converted the entities to the Third Normal Form and provided the entity name, column names, and appropriate key type. 

The primary advantage should be apparent; if a need arises at some point in the future to:
  • Add additional attributes to the items entity
  • Alter the DATE attribute's time stamp to MM-DD-YYYY in the ORDER entity.
This can be accomplished without disturbing the other entities and their relationships.
Here we now have our original Orders entity broken out into 3NF. Where each entity represents a single subject or noun and the primary keys are clearly defined. Attributes are now of a single value and simple (atomic data). However, this comes with a price as the additional entities will cause additional IO operations and processing logic to join them.


There are numerous articles on the advantages and disadvantages of normalization. Certainly the price of hard drives over time has changed and IO may not be the issue that it was in the past. Yet with the advent of tablet and mobile devices, the need for speed is at a premium once again.

Comments

Popular posts from this blog

Router modes and channel impairments

What does it mean for a wireless network to be operating in "infrastructure mode?" Clients connected to a base station (an access point or router) are operating in infrastructure mode. They communicate indirectly through this access point or router which serves as a bridge to a wired network. Ad-hoc mode networks are networks that do not rely on a router or access point infrastructure. As a result, each client participates in the routing by forwarding data to the other connected clients. What are the differences between the following types of wireless channel impairments:  path loss, multipath propagation, interference from other sources? Multipath propagation is when packet loss occurs due to electromagnetic waves reflecting off of ground objects which then take paths of different lengths between sender and receiver. Interference from other sources happens when there is interference from radio sources transmitting in the same frequency band. Wireless phones and wirel...

Solar Tones

Solar Tones is a system of tone generation based on the position of the planets in the solar system for any given month. The tonal or key centers were derived from the open strings of the guitar. Background The guitar is constructed in such a way that the intervals occur chromatically ascending and descending. I've drawn the parallel between the construction of the guitar and the structure of our solar system. Planetary order Sun Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto This system assigns an interval to each planet in the solar system, and additionally depending on their positions in the solar system, I have a different set of tones each month to use as a composition frame work. Planets and their equivalent interval order Sun -> Open Mercury -> mi2 Venus -> Ma2 Earth -> mi3 Mars -> Ma3 Belt of Asteroids -> P4 & aug4 Jupiter -> P5 Saturn -> mi6 Uranus -> Ma6 Neptune -> mi7 Pluto -> Ma7 The role of planetary as...