CPS 430/542 Lecture notes:
Attribute and FD Closure Algorithms and Canonical Cover
Coverage: [FCDB] §§3.4-3.5 (pp. 82-102)
Basic review of logic
- `if it is raining outside, I carry an umbrella';
what can you conclude about the state of the world if you
see me roaming around Miriam hall with an umbrella in my hand?
- if and only if review (iff), bidirectional, ↔
- concept of a model
- concept of entailment in logic, |=
- X |= Y: (read left to right) X entails Y
- X |= Y: (read right to left) Y follows from X, or Y is a semantic
consequence of X
- p ∧ q |= p ∨ q
- p ∨ q does not entail p ∧ q and,
therefore, p ∧ q is not equivalent to p ∨ q
| p |
q |
p ∨ q |
p ∧ q |
| T |
T |
T |
T |
| T |
F |
T |
F |
| F |
T |
T |
F |
| F |
F |
F |
F |
- if X |= Y and Y |= X, then X <=> Y
- DeMorgan's Laws
- ¬(A ∨ B) ⇔ ¬A ∧ ¬B
- ¬(A ∧ B) ⇔ ¬A ∨ ¬B
Fixpoints
- fixpoint of a function is a point that is mapped to itself by the
function
- f(x) = x
- 2 is fixpoint for f(x) = x2 - 3x + 4 because
f(2) = 2
- sometimes there is more than one fixpoint, least fixpoint
and greatest fixpoint
- e.g., in f(x) = x2,
- x = 0 is the least fixpoint, and
- x = 1 is the greatest fixpoint
- not all functions have fixpoints, e.g., f(x) = x+1
- calculator example: square root function, Newton's method
Newton's method
Closure of a set of FD's
(courtesy [DBSC] Fig. 7.8 [p. 280])
F+ = F
repeat
for each FD f ∈ F+
apply reflexivity and augmentation rules on f
add the resulting FD's to F+
for each pair of FD's f1 and f2 ∈ F+
if f1 and f2 can be combined using transitivity
add the resulting FD to F+
until F+ does not change any further
When does one set of FD's S follow
from another T (i.e., T |= S)?
if every relation instance which satisfies
all FD's in T also satisfies all the FD's in S
When are two sets of
FD's S and T equivalent (i.e., T ⇔ S)?
- if S+ = T+
- iff S follows from T and T follows from S; T |= S and S |= T;
S ⇔ T
- if the set of relation instances satisfying S is
exactly the same as the set of relation instances
satisfying T
[DBAA] example 6.4.1 (p. 203)
Closure of a set of attributes
- closure
of a set of attributes A under the set of FD's S is the
set of attributes B such that every relation which satisfies
all of the FD's in S also satisfies A → B [FCDB]
- in other words, A → B `follows from' S
(we can also say S |= A → B)
- closure of a set of attributes {A1, A2, ...,
An} is denoted {A1, A2,
..., An}+
- X+F = {A | X → A ∈ F+}
- X+F = {A | F+ |= X → A}
- algorithm (courtesy [DBSC] Fig. 7.9 [p. 281])
F given
result = A
while (changes to result)
- another approach
- start with initial set of attributes X
- identify FD's A → B where A ∈ X,
but B ∉ X
- add B to X
- repeat until no more attributes can be added to the closure, or,
in other words, when you reach a fixpoint
- what is the running time of the attribute closure algorithm?
Simple exercise
- R(A, B, C, D, E, F),
S = {A B → C, B C → A D,
D → E, C F → B}
- compute {AB}+
- {AB}+ = {ABCDE}
- means S |= AB → CDE
[DBAA] example 6.4.2 (p. 204)
Basic properties
- when is {A1, A2, ..., A2}+
the set of all attributes in a relation?
- a set of attributes A is a superkey for R
iff {A}+ = R
- a set of attributes A is a key for R iff
{A}+ = R and
no subset X of A exists where X+ = R
Uses of the attribute closure algorithm?
(see [DBSC] p. 282)
- determine if a set of attributes X is a superkey;
X+ = {all attributes in R}
- can check if an FD A → B
holds in a relation (i.e., S |= A → B)?
- compute {A}+
- check if B ∈ {A}+
- if so, A → B
- examples
- [DBAA] example 6.4.3 (pp. 205-206)
- does A B → D follow from S?
approach, compute {AB}+
- does D → E follow from S?
approach, compute {D}+
- can infer all FD's which follow from a given set of FD's; an
alternative to F+
for each Δ ⊆ R
compute Δ+
for each S ⊆ of Δ+
- this is an exponential algorithm (in the number of attributes)
- is it NP-complete?
- optimizations
- the empty set and the set containing all attributes
will never lead to any nontrivial FD's
- once we determine that a set of attributes S is
a superkey, we need not compute the closure of
any supersets of S because they will never lead
to any new nontrivial FD's
Example
all pairs
{A,B}+ = {A,B,C,D}
{A,C}+ = {A,C,D}
{A,D}+ = {A,D}
{B,C}+ = {A,B,C,D}
{B,D}+ = {A,B,C,D}
{C,D}+ = {A,C,D}
new FD's: A B → D, B C → A
B D → C
all triples
{A,B,C}+ = {A,B,C,D}
{A,B,D}+ = {A,B,C,D}
{A,C,D}+ = {A,C,D}
{B,C,D}+ = {A,B,C,D}
no new FD's?
all subsets of size 4: no need to look at them
so the complete set of completely nontrivial FD's is
{A B → C, C → D, D → A,
C → A, A B → D,
B C → A, B D → C}
we need not have computed the closure of sets {ABC},
{ABD}, and {BCD}
Basis set of FD's
- a set of FD's F is a basis set of FD's iff F+ =
{all FD's of the relation}
- a basis set of FD's f is a minimal basis iff no
proper subset of it is also a basis
- a relation may have several minimal bases
Why compute a minimal basis?
What is a minimal basis (canonical cover)
Fc?
Fc is a minimal basis iff F |= Fc and
Fc |= F, and
Fc has no extraneous attributes, and all lhs are unique
Concept of extraneousness
two ways to be extraneous
- extraneous FD, e.g., A → C is an extraneous FD in the set
{A → B, B → C, A → C}
- extraneous attribute (on either side of an FD)
AB → C, A → C (B is extraneous in the first FD)
AB → CD, A → C (is any attribute extraneous?)
given X → Y ∈ F
A is extraneous if A ∈ X and F |= (F-(X → Y)) ∪ {(X-A) → Y}
I=X-A
check if F |= I → Y
if Y ⊆ IF+, then A is extraneous
or
A is extraneous if A ∈ Y and (F-(X → Y)) ∪ {(X → (Y-A))} |= F
F' = (F-(X → Y)) ∪ {X → (Y-A)}
F' |= X → A
if A ⊆ XF'+, then A is extraneous
example: F = {AB → CD, A → E, E → C},
is C extraneous in the
first FD?
ABF'+ = {AB → D, A → E,
E → C} =
{ABCDE}
The answer is yes.
Canonical cover algorithm
Fc = F
repeat
use join rule to combine X → Y, X → Z to X → YZ
find an FD in Fc with an extraneous attribute in either X or Y
and delete it from X → Y
until you reach a fixpoint for Fc
Canonical cover example
A → BC
B → C
A → B
AB → C
(C is extraneous in the first FD and
A is extraneous in the last FD)
minimal basis: {A → B, B → C}
a minimal basis does not always have the smallest number of FD's,
e.g., which of the following two sets of FD's is a minimal basis?
{A → B, B → C} or {A → C}
remember a minimal basis must be a basis first
Soundness and completeness of algorithms
- sound: finds no false positives (returns no
wrong answers)
- complete: finds all true positives (returns all
right answers)
- ideally we want both
- means Armstrong's axioms are sound and complete
FD's in projected relations
- what is projection?
- what FD's hold in the projected relation?
- example 5 (courtesy [FCDB] example 3.23, pp. 99-100):
R(A, B, C, D)
with S = {A → B, B → C,
C → D},
project to S(A, C, D),
take closure of all subsets,
add FD X → E for each attribute E that is in X+ and S,
but not R
References
| [DBAA] |
M. Kifer, A. Bernstein, and P. M. Lewis. Database Systems: An
Application-Oriented Approach.
Addison-Wesley, Boston, MA, Second edition, 2006.
|
| [DBSC] |
A. Silberschatz, H.F. Korth, and S. Sudarshan. Database Systems Concepts.
McGraw Hill, Boston, MA, Fifth edition, 2006.
|
| [FCDB] |
J.D. Ullman and J. Widom. A First Course in Database Systems.
Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.
|
|