Dev Journal #3 - ERD and Data Normalization



Entity Relationship Diagram

Before jumping into coding, I took some time to design the essential data structure that is required to build this application. Below is my initial entity relationship diagram that illustrates the logical structure of my database.

It’s a fairly simple desgin consisting of the two main entities: Users and Logs. The trickiest part will be understanding how to properly break down each training session information before logging them into the database. This process of breaking down the information into granular bits is called data normalization. It is crucial to properly normalize the data to avoid redundant or inconsistent data. There are multiple levels of normalizations that can be applied to the relational database. For most applications, level three, or also known as the third normal form is considered the highest level necessary. In the following sections, I will give a brief explanation of how to apply the first, second and third normal forms with related examples.

First Normal Form

For the sake of creating an example, let’s say I performed 1 set of pull ups on an angled pull up bar, 1 set of pull ups on a straight pull up bar, and 1 set of muscle up on the rings to exercise my back muscles; 1 set of dips on a straight pull up bar, 2 sets of dips on the parallel bars, and another 2 sets of dips on the rings to exercise my chest muscles. The equivalent piece of information translated into the database without any normalization would look like this:

exercise_info
set 1 of pull ups on angled pull up bar for back muscle
set 2 of pull ups on straight pull up bar for back muscle
set 1 of muscle ups on rings for back muscle
set 1 of dips on straight pull up bars for chest muscle
set 2 of dips on parallel bars for chest muscle
set 3 of dips on parallel bars for chest muscle
set 4 of dips on rings for chest muscle
set 5 of dips on rings for chest muscle

A relation in first normal form (1NF) should only store atomic values to each table attributes, and each row should be uniquely identified. To achieve 1NF, I split each row into multiple columns where each column holds indivisible information. Note that I don’t actually have the column bar_type in my real application. This info was added as an afterthought to explain 3NF violation in the later section.

set_number exercise_name equipment muscle_group bar_type
1pull upspull up barbackangled
2pull upspull up barbackstraight
1muscle upsringsbacknull
1dipspull up barcheststraight
2dipsparallel barschestnull
3dipsparallel barschestnull
4dipsringschestnull
5dipsringschestnull

Second Normal Form

The second normal form (2NF) aims to remove functional dependency on anything other than the entity’s candidate key. The candidate key is the combination of attributes that uniquely identifies a row in the database. In my example, the candidate key is (set_number, exercise_name, equipment) and the functional dependency formula can be defined as:

FD = {
set_number, exercise_nameequipment, muscle_group, bar_type
}

This schema is not in 2NF because exercise_namemuscle_group, as in exercise_name alone can uniquely determine the value of muscle_group. Because exercise_name is a subset of the candidate key, this violates 2NF. To convert my example into 2NF, I can move the exercise muscle group information into a separate table named Exercises. ExerciseLogs table should now refer to this data in Exercises table by a foreign key.

Exercises

exercise_name muscle_group
pull upsback
muscle upsback
dipschest

ExerciseLogs

set_number exercise_name equipment bar_type
1pull upspull up barangled
2pull upspull up barstraight
1muscle upsringsnull
1dipspull up barstraight
2dipsparallel barsnull
3dipsparallel barsnull
4dipsringsnull
5dipsringsnull

Third Normal Form

A relation is in third normal form (3NF) if it already satisfies 2NF requirements and no non-prime attribute is transitively dependent on the candidate keys. The newly created Exercises table meets the 3NF requirements. The candidate key is exercise_name and { exercise_namemuscle_group } is the only functional dependency. However, ExerciseLogs table fails 3NF due to the transitive functional dependency { bar_typeequipment } which is a dependency between a non-prime attribute to another non-prime attribute. bar_type values are non-trivially dependednt on specific equipment values which in this case only applies to the pull up bar. To fix 3NF violations, I can further extract out the equipment information into a separate entity with identifiable primary key like the following:

Exercises

exercise_name muscle_group
pull upsback
muscle upsback
dipschest

Equipments

equipment_id equipment_type bar_type
0pull up barangled
1pull up barstraight
2parallel barsnull
3ringsnull

ExerciseLogs

set_number exercise_name equipment_id
1pull ups0
2pull ups1
1muscle ups3
1dips1
2dips2
3dips2
4dips3
5dips3

In some cases where 3NF relation has multiple overlapping candidate keys, it may require additional normalization to satisfy the Boyce Codd Normal Form (BCNF). But as mentioned earlier, generally normalizing to 3NF is enough to avoid anomalies during update, insert and delete operation. I am sure I will come across more complicated normalization problems later down the road but I think this is good enough of a review for now.

< Previous Post Next Post >

Comments