CPS 430/542 Lecture notes: Normalization
Coverage: [FCDB] §§3.63.7 (pp. 102127)
The BIG picture
 ideas → E/R → relations →
better (normalized) relations
 why study FD's?
inferring FD's is very important to identifying
flaws in the design of a database
 final goal: BoyceCodd Normal Form (BCNF)
Students relation
 key: {id, advisor_id}
 FD's: {id → name level
favorite_advisor, advisor_id →
advisor_office}
name  id  level  advisor_id  advisor_office  favorite_advisor

Mark 1 Senior 349 AN151 350
??? 1 ??? 350 MH134 ???
Kathy 2 Senior 146 AN240 146
??? 1 ??? 351 AN130 ???
??? 1 ??? 352 AN131 ???
??? 2 ??? 351 ??? ???
David 3 Junior 349 ??? 349

`???' denotes redundant, i.e., can be inferred from other tuples
Sources of redundancy
 update anomaly (more serious of the two):
what happens if we update the favorite_advisor of Mark?
 deletion anomaly:
if 350 is not the advisor for anybody, we lose their room info!
Root cause of the problem
 main idea: if each FD is not in the form `key → {all other attributes}'
then the relation has too much stuff in it
 specifically,
each FD has a lhs which is a proper subset of the key! (a BCNF violation)
Normalization
 process of making relations better
by decomposing them into smaller relations to
 reduce redundancy
 eliminate update anomalies
 eliminate deletion anomalies
 final goal: all relations in BoyceCodd Normal Form (BCNF)
BoyceCodd Normal Form
 the lhs of every nontrivial FD is a superkey
 above anomalies are guaranteed not to exist
in relations which satisfy this condition
 formally, R is in BCNF if for every nontrivial
FD X → A in R, X is a superkey
 advantages
 removes redundancy
 removes update anomalies
 removes deletion anomalies
Proof that every twocolumn relation is in BCNF
 there are no nontrivial FD's; key: {A, B};
R in BCNF
 A → B
is the only nontrivial FD; key: {A}; R in BCNF
 B → A
is the only nontrivial FD; key: {B}; R in BCNF
 A → B,
B → A are the only nontrivial FD's keys: {A},
{B}; R in BCNF
Decomposition into BCNF
 need to ensure that we can reconstruct the decomposed relation
 therefore, we cannot just break a relation schema
into a collection of twoattribute relations
 process:
 find a nontrivial FD where the lhs is not a superkey;
make sure the rhs is maximally expanded
 split the original relation into two new relations
 one containing only the attributes from both sides
of the violating FD
 the other containing all attributes from the
original relation except the rhs of the violating FD
Decomposing Movies
Movies(title, year, length, filmType, studioName, starName)
 has update and deletion anomalies
 is it in BCNF? no; violating FD: title year →
length, filmType, studioName
 decompose into
 Movies1(title, year, length, filmType, studioName)
 Movies2(title, year, starName)
 are these relations in BCNF?
 must determine its complete set of FD's by
computing the closure of each subset of its attributes,
using the full set of FD's satisfied by the decomposed relation
 does Movies2 have an update anomaly?
Another Movie example
 MovieStudio(title, year, length, filmType, studioName, studioAddr)
 FD's: title year → length filmType studioName studioAddr,
studioName → studioAddr
 key: {title, year}
 decomposition:
 MovieStudio1(title, year, length, filmType, studioName)
 MovieStudio2(studioName, studioAddr)
Is one application sufficient? No
Reconstructing a relation from a decomposition
Third Normal Form (3NF)
now FD title city → theatre is not preserved
solution: slightly relaxing BCNF requirement leads to the
third normal form (3NF)
a relation is
in 3NF if for each nontrivial FD X → Y,
either X is a superkey or Y is prime
(a member of some key)
decomposition into relations in 3NF
 does not lose any information, and
 allows FD's to be verified,
 but if the relations
are not also in BCNF, then there will
be some redundancy in the schema
Recap
 the decomposition into BCNF provides
a lossless join decomposition, i.e.,
we can reconstruct the tuples of the
original relation by joining
 the BCNF decomposition however does
not preserve dependencies
 3NF is weaker than BCNF
 decomposition into 3NF (not covered)
 preserves dependencies, and
 provides a lossless join,
 but does not guarantee the elimination
of redundancy (unless, of course, the relations are also in BCNF)
Good properties of breakups (if such things exist :)
 removes redundancy
 lossless join decomposition
 dependency preservation
 BCNF guarantees 1 & 2
 3NF guarantees 2 & 3
 no normal form guarantees all 3
Interesting example

name address car

Carla 1 Honda
Carla 1 Toyota
Carla 2 Honda
Carla 2 Toyota
Chuck 1 Ford
Chuck 1 Chevy
Chuck 3 Ford
Chuck 3 Chevy

There are no nontrivial FD's in this relation.
Is this relation in BCNF?
Is there any redundancy?
for one person, each address repeated for each car
means attribute address is independent of attribute car
specified by MVD: name →→ address
Multivalued dependencies (MVD's)
 assertion that two attributes or
sets of attributes are independent
of each other
 schematic view
 written A →→ B
 MVD's are a generalization of FD's: all FD's are MVD's, why?
Every FD is an MVD
Every FD X → Y is also an MVD X →→ Y:
Consider the following
relation which satisfies the FD id → name:

id name likes_color

1 X red
1 X blue
2 Y red
2 Y blue

We want to see if this relation also satisfies the MVD
id →→ name.
To do that we need to find a tuple v which
 agrees with two tuples t and u on the
lhs of the MVD (id),
 agrees with t on the rhs of the MVD (name), and
 agrees on all else (likes_color) with u.

id name likes_color

(t) 1 X red
(u & v) 1 X blue

The tuple u can be such a tuple v.
Tuple v agrees with both t and u on id,
it agrees with t
on name, and it agrees with u on likes_color.
Must find such a tuple v for every id.
Therefore this relation satisfies the MVD id →→
likes_color.
In general, every FD is also a MVD. You can find never an FD satisfied
by a relation R that is not also a MVD satisfied by R.
Note that every MVD is not necessarily an FD.
Rules for MVD's
 MVD's do not obey all of the rules of FD's
 MVD's do not obey the splitting/combining rule
 i.e., X →→ Y
and X →→ Z does not imply X →→
Y Z
 e.g., name →→ street does not hold in
StarsIn
 complementation rule
 means name →→ car
 has no analog in the world of FD's
 transitive rule
 every FD is a MVD; why?
 trivial dependencies rule (resembles reflexivity)
Fourth Normal Form (4NF)
 eliminates the redundancy caused by MVD's
 nontrivial MVD A →→ B
 none of the Bs are among the As
 A ∪ B ≠ R
 condition: essentially BCNF condition applied to MVD's
 for all nontrivial MVD's A →→ B,
A is a superkey
 means every BCNF violation is a 4NF violation
 every relation in 4NF is in BCNF
Decomposition into 4NF
 analogous to decomposition in BCNF;
just string replace acronym FD with MVD
 Cars example
 now MVD name →→ address is trivial
 ditto for name →→ car
 how to determine the new MVD's?
there is no easy way (such as computing the closure in BCNF)
Relationship of normal forms
 concentric circles: 4NF ⊂ BCNF ⊂ 3NF
 1NF: requires every column to have an atomic value
 2NF: primarily historical at this point
 5NF
 further generalization of MVD's to JDs
 beyond the scope of CPS 430/542
Good properties of breakups (final)
 removes redundancy caused by FD's
 removes redundancy caused by MVD's
 lossless join decomposition
 preserves FD's
 preserves MVD's
 4NF guarantees 1, 2, & 3
 BCNF guarantees 1 & 3
 3NF guarantees 3 & 4
 no normal form guarantees all 5
Goal
 BCNF (no redundancy due to FD's & lossless join)
 if not possible, aim for
 3NF
 lossless join
 dependency preservation
Frequenty asked question (and summary)
If a relation R is in BCNF, then it is automatically in 3NF because
3NF is a weaker normal form than BCNF. Then does not BCNF preserve
dependencies like 3NF if all relations in BCNF are also in 3NF?
First, we must realize we are talking about two different things
(`normal forms' and `decomposition algorithms') and we should not confuse
the two.
When we talk of normal forms, the following holds:
 the set of relations in 4NF is a proper subset of the set of relations
in BCNF
 the set of relations in BCNF is a proper subset of the set of
relations in 3NF
This means any relation in BCNF is automatically in 3NF.
Now, when we talk of properties of `decompositions' (not normal
forms), there is no such subset relationship.
We say the `decomposition into BCNF'
removes redundancy (but for all we know,
the smaller relations resulting may not be in BCNF at all
(and we saw such an example in
class). In such cases,
we must apply the decomposition algorithm again.
We say the `decomposition into 3NF' preserves dependencies. Saying that
relations in 3NF preserve dependencies is nonsense (what would that mean?).
When we talk of good properties of breakups, we are talking
about `decompositions' NOT normal forms. We say that the `decomposition
procedure into 3NF' preserves FD's, but that has nothing to do with saying that
all relations in BCNF are also in 3NF. You will notice that
we never showed the
`decomposition procedure into 3NF.' We only covered the breakup procedure into
BCNF and 4NF.
One can never find a relation in BCNF that is not in 3NF.
In summary, a relation is in
 3NF, if for every nontrivial FD X → Y, X is either
a superkey or Y is part of a key
 BCNF, if for every nontrivial FD X → Y, X is
a superkey
 4NF if for every nontrivial MVD X →→ Y, X is
a superkey
To decompose into
 3NF (not covered in this course)
 BCNF: find a violating FD X → Y
 create a new relation R1(X, Y)
 create a new relation R2(X, RY)
 4NF: find a violating MVD X →→ Y
 create a new relation R1(X, Y)
 create a new relation R2(X, RY)
References
[FCDB] 
J.D. Ullman and J. Widom. A First Course in Database Systems.
Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.

