CPS 430/542 Lecture notes: Normalization
Coverage: [FCDB] §§3.6-3.7 (pp. 102-127)
The BIG picture
- ideas → E/R → relations →
better (normalized) relations
- why study FD's?
inferring FD's is very important to identifying
flaws in the design of a database
- final goal: Boyce-Codd Normal Form (BCNF)
Students relation
- key: {id, advisor_id}
- FD's: {id → name level
favorite_advisor, advisor_id →
advisor_office}
name | id | level | advisor_id | advisor_office | favorite_advisor
------------------------------------------------------------------
Mark 1 Senior 349 AN151 350
??? 1 ??? 350 MH134 ???
Kathy 2 Senior 146 AN240 146
??? 1 ??? 351 AN130 ???
??? 1 ??? 352 AN131 ???
??? 2 ??? 351 ??? ???
David 3 Junior 349 ??? 349
------------------------------------------------------------------
`???' denotes redundant, i.e., can be inferred from other tuples
Sources of redundancy
- update anomaly (more serious of the two):
what happens if we update the favorite_advisor of Mark?
- deletion anomaly:
if 350 is not the advisor for anybody, we lose their room info!
Root cause of the problem
- main idea: if each FD is not in the form `key → {all other attributes}'
then the relation has too much stuff in it
- specifically,
each FD has a lhs which is a proper subset of the key! (a BCNF violation)
Normalization
- process of making relations better
by decomposing them into smaller relations to
- reduce redundancy
- eliminate update anomalies
- eliminate deletion anomalies
- final goal: all relations in Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form
- the lhs of every nontrivial FD is a super-key
- above anomalies are guaranteed not to exist
in relations which satisfy this condition
- formally, R is in BCNF if for every nontrivial
FD X → A in R, X is a superkey
- advantages
- removes redundancy
- removes update anomalies
- removes deletion anomalies
Proof that every two-column relation is in BCNF
- there are no nontrivial FD's; key: {A, B};
R in BCNF
- A → B
is the only nontrivial FD; key: {A}; R in BCNF
- B → A
is the only nontrivial FD; key: {B}; R in BCNF
- A → B,
B → A are the only nontrivial FD's keys: {A},
{B}; R in BCNF
Decomposition into BCNF
- need to ensure that we can reconstruct the decomposed relation
- therefore, we cannot just break a relation schema
into a collection of two-attribute relations
- process:
- find a nontrivial FD where the lhs is not a superkey;
make sure the rhs is maximally expanded
- split the original relation into two new relations
- one containing only the attributes from both sides
of the violating FD
- the other containing all attributes from the
original relation except the rhs of the violating FD
Decomposing Movies
Movies(title, year, length, filmType, studioName, starName)
- has update and deletion anomalies
- is it in BCNF? no; violating FD: title year →
length, filmType, studioName
- decompose into
- Movies1(title, year, length, filmType, studioName)
- Movies2(title, year, starName)
- are these relations in BCNF?
- must determine its complete set of FD's by
computing the closure of each subset of its attributes,
using the full set of FD's satisfied by the decomposed relation
- does Movies2 have an update anomaly?
Another Movie example
- MovieStudio(title, year, length, filmType, studioName, studioAddr)
- FD's: title year → length filmType studioName studioAddr,
studioName → studioAddr
- key: {title, year}
- decomposition:
- MovieStudio1(title, year, length, filmType, studioName)
- MovieStudio2(studioName, studioAddr)
Is one application sufficient? No
solution: decompose again
final relational schema
- R1(title, year, studioName)
- R2(studioName, president)
- R3(president, presAddr)
Reconstructing a relation from a decomposition
Third Normal Form (3NF)
now FD title city → theatre is not preserved
solution: slightly relaxing BCNF requirement leads to the
third normal form (3NF)
a relation is
in 3NF if for each nontrivial FD X → Y,
either X is a superkey or Y is prime
(a member of some key)
decomposition into relations in 3NF
- does not lose any information, and
- allows FD's to be verified,
- but if the relations
are not also in BCNF, then there will
be some redundancy in the schema
Recap
- the decomposition into BCNF provides
a lossless join decomposition, i.e.,
we can reconstruct the tuples of the
original relation by joining
- the BCNF decomposition however does
not preserve dependencies
- 3NF is weaker than BCNF
- decomposition into 3NF (not covered)
- preserves dependencies, and
- provides a lossless join,
- but does not guarantee the elimination
of redundancy (unless, of course, the relations are also in BCNF)
Good properties of breakups (if such things exist :-)
- removes redundancy
- lossless join decomposition
- dependency preservation
- BCNF guarantees 1 & 2
- 3NF guarantees 2 & 3
- no normal form guarantees all 3
Interesting example
----------------------
name address car
----------------------
Carla 1 Honda
Carla 1 Toyota
Carla 2 Honda
Carla 2 Toyota
Chuck 1 Ford
Chuck 1 Chevy
Chuck 3 Ford
Chuck 3 Chevy
----------------------
There are no nontrivial FD's in this relation.
Is this relation in BCNF?
Is there any redundancy?
for one person, each address repeated for each car
means attribute address is independent of attribute car
specified by MVD: name →→ address
Multivalued dependencies (MVD's)
- assertion that two attributes or
sets of attributes are independent
of each other
- schematic view
- written A →→ B
- MVD's are a generalization of FD's: all FD's are MVD's, why?
Every FD is an MVD
Every FD X → Y is also an MVD X →→ Y:
Consider the following
relation which satisfies the FD id → name:
-----------------------
id name likes_color
-----------------------
1 X red
1 X blue
2 Y red
2 Y blue
-----------------------
We want to see if this relation also satisfies the MVD
id →→ name.
To do that we need to find a tuple v which
- agrees with two tuples t and u on the
lhs of the MVD (id),
- agrees with t on the rhs of the MVD (name), and
- agrees on all else (likes_color) with u.
----------------------------------
id name likes_color
----------------------------------
(t) 1 X red
(u & v) 1 X blue
----------------------------------
The tuple u can be such a tuple v.
Tuple v agrees with both t and u on id,
it agrees with t
on name, and it agrees with u on likes_color.
Must find such a tuple v for every id.
Therefore this relation satisfies the MVD id →→
likes_color.
In general, every FD is also a MVD. You can find never an FD satisfied
by a relation R that is not also a MVD satisfied by R.
Note that every MVD is not necessarily an FD.
Rules for MVD's
- MVD's do not obey all of the rules of FD's
- MVD's do not obey the splitting/combining rule
- i.e., X →→ Y
and X →→ Z does not imply X →→
Y Z
- e.g., name →→ street does not hold in
StarsIn
- complementation rule
- means name →→ car
- has no analog in the world of FD's
- transitive rule
- every FD is a MVD; why?
- trivial dependencies rule (resembles reflexivity)
Fourth Normal Form (4NF)
- eliminates the redundancy caused by MVD's
- nontrivial MVD A →→ B
- none of the Bs are among the As
- A ∪ B ≠ R
- condition: essentially BCNF condition applied to MVD's
- for all nontrivial MVD's A →→ B,
A is a superkey
- means every BCNF violation is a 4NF violation
- every relation in 4NF is in BCNF
Decomposition into 4NF
- analogous to decomposition in BCNF;
just string replace acronym FD with MVD
- Cars example
- now MVD name →→ address is trivial
- ditto for name →→ car
- how to determine the new MVD's?
there is no easy way (such as computing the closure in BCNF)
Relationship of normal forms
- concentric circles: 4NF ⊂ BCNF ⊂ 3NF
- 1NF: requires every column to have an atomic value
- 2NF: primarily historical at this point
- 5NF
- further generalization of MVD's to JDs
- beyond the scope of CPS 430/542
Good properties of breakups (final)
- removes redundancy caused by FD's
- removes redundancy caused by MVD's
- lossless join decomposition
- preserves FD's
- preserves MVD's
- 4NF guarantees 1, 2, & 3
- BCNF guarantees 1 & 3
- 3NF guarantees 3 & 4
- no normal form guarantees all 5
Goal
- BCNF (no redundancy due to FD's & lossless join)
- if not possible, aim for
- 3NF
- lossless join
- dependency preservation
Frequenty asked question (and summary)
If a relation R is in BCNF, then it is automatically in 3NF because
3NF is a weaker normal form than BCNF. Then does not BCNF preserve
dependencies like 3NF if all relations in BCNF are also in 3NF?
First, we must realize we are talking about two different things
(`normal forms' and `decomposition algorithms') and we should not confuse
the two.
When we talk of normal forms, the following holds:
- the set of relations in 4NF is a proper subset of the set of relations
in BCNF
- the set of relations in BCNF is a proper subset of the set of
relations in 3NF
This means any relation in BCNF is automatically in 3NF.
Now, when we talk of properties of `decompositions' (not normal
forms), there is no such subset relationship.
We say the `decomposition into BCNF'
removes redundancy (but for all we know,
the smaller relations resulting may not be in BCNF at all
(and we saw such an example in
class). In such cases,
we must apply the decomposition algorithm again.
We say the `decomposition into 3NF' preserves dependencies. Saying that
relations in 3NF preserve dependencies is nonsense (what would that mean?).
When we talk of good properties of breakups, we are talking
about `decompositions' NOT normal forms. We say that the `decomposition
procedure into 3NF' preserves FD's, but that has nothing to do with saying that
all relations in BCNF are also in 3NF. You will notice that
we never showed the
`decomposition procedure into 3NF.' We only covered the breakup procedure into
BCNF and 4NF.
One can never find a relation in BCNF that is not in 3NF.
In summary, a relation is in
- 3NF, if for every nontrivial FD X → Y, X is either
a superkey or Y is part of a key
- BCNF, if for every nontrivial FD X → Y, X is
a superkey
- 4NF if for every nontrivial MVD X →→ Y, X is
a superkey
To decompose into
- 3NF (not covered in this course)
- BCNF: find a violating FD X → Y
- create a new relation R1(X, Y)
- create a new relation R2(X, R-Y)
- 4NF: find a violating MVD X →→ Y
- create a new relation R1(X, Y)
- create a new relation R2(X, R-Y)
References
| [FCDB] |
J.D. Ullman and J. Widom. A First Course in Database Systems.
Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.
|
|