Dev Journal #3 - ERD and Data Normalization


Entity Relationship Diagram

Initial draft of ERD

Exercise tracking consists of the four main entities: Users, Exercises, Sessions, and Logs. Users is the entity that will represent the user accounts and profile information. Exercises will hold information about each type of exercise that can be tracked. Sessions will represent each workout session. Finally, Logs will hold information of each set performed within the workout sessions.

In order to organize each workout so that I can query and create statistics, I must normalize the information into a separate table listed above. This process of breaking down the information into granular bits is called data normalization. It is crucial to properly normalize the data to avoid redundant or inconsistent data. There are multiple levels of normalizations that can be applied to the relational database. For most cases, level three, or also known as the third normal form is considered an adequate level of normalization. In the following sections, I will give a brief explanation of how to apply the first, second and third normal forms with related examples.

First Normal Form

For the sake of creating an example, let’s say I performed

The equivalent piece of information translated into the database without any normalization would look like this:

exercise_info
set 1 of pull ups on angled pull up bar for back muscle
set 2 of pull ups on straight pull up bar for back muscle
set 1 of muscle ups on rings for back muscle
set 1 of dips on straight pull up bars for chest muscle
set 2 of dips on parallel bars for chest muscle
set 3 of dips on parallel bars for chest muscle
set 4 of dips on rings for chest muscle
set 5 of dips on rings for chest muscle

A relation in the first normal form (1NF) should only store atomic values to each table attributes, and each row should be uniquely identified. To achieve 1NF, I split each row into multiple columns where each column holds indivisible information. Note that I don’t actually have the column bar_type in my real application. This info was added as an afterthought to explain 3NF violation in the later section.

set_number exercise_name equipment muscle_group bar_type
1pull upspull up barbackangled
2pull upspull up barbackstraight
1muscle upsringsbacknull
1dipspull up barcheststraight
2dipsparallel barschestnull
3dipsparallel barschestnull
4dipsringschestnull
5dipsringschestnull

Second Normal Form

The second normal form (2NF) aims to remove the functional dependency on anything other than the entity’s candidate key. The candidate key is the combination of attributes that uniquely identifies a row in the database. In my example, the candidate key is (set_number, exercise_name, equipment) and the functional dependency formula can be defined as:

FD = { set_number, exercise_nameequipment, muscle_group, bar_type }

This schema is not in 2NF because exercise_namemuscle_group, as in exercise_name alone can uniquely determine the value of muscle_group. Because exercise_name is a subset of the candidate key, this violates 2NF. To convert my example into 2NF, I can move the exercising muscle group information into a separate table named Exercises. ExerciseLogs table should now refer to this data in Exercises table by a foreign key.

Exercises

exercise_name muscle_group
pull upsback
muscle upsback
dipschest

ExerciseLogs

set_number exercise_name equipment bar_type
1pull upspull up barangled
2pull upspull up barstraight
1muscle upsringsnull
1dipspull up barstraight
2dipsparallel barsnull
3dipsparallel barsnull
4dipsringsnull
5dipsringsnull

Third Normal Form

A relation is in third normal form (3NF) if it already satisfies 2NF requirements and no non-prime attribute is transitively dependent on the candidate keys. The newly created Exercises table meets the 3NF requirements. The candidate key is exercise_name and { exercise_namemuscle_group } is the only functional dependency. However, ExerciseLogs table fails 3NF due to the transitive functional dependency { bar_typeequipment } which is a dependency between a non-prime attribute to another non-prime attribute. bar_type values are non-trivially dependent on specific equipment values which in this case only applies to the pull up bar. To fix 3NF violations, I can further extract the equipment information into a separate entity with an identifiable primary key like the following:

Exercises

exercise_name muscle_group
pull upsback
muscle upsback
dipschest

Equipments

equipment_id equipment_type bar_type
0pull up barangled
1pull up barstraight
2parallel barsnull
3ringsnull

ExerciseLogs

set_number exercise_name equipment_id
1pull ups0
2pull ups1
1muscle ups3
1dips1
2dips2
3dips2
4dips3
5dips3

In some cases where 3NF relation has multiple overlapping candidate keys, it may require additional normalization to satisfy the Boyce Codd Normal Form (BCNF). But as mentioned earlier, generally normalizing to 3NF is enough to avoid anomalies during an update, insert and delete operation. Higher level of normalization also comes down with performance costs because it increases the number of tables and joins. I am sure I will come across more complicated normalization problems later down the road but I think this is good enough of a review for now.

Previous Post Next Post

Comments