# CPS 430/542 Lecture notes: Attribute and FD Closure Algorithms and Canonical Cover

Coverage: [FCDB] §§3.4-3.5 (pp. 82-102)

## Basic review of logic

• `if it is raining outside, I carry an umbrella'; what can you conclude about the state of the world if you see me roaming around Miriam hall with an umbrella in my hand?
• if and only if review (iff), bidirectional, ↔
• concept of a model
• concept of entailment in logic, |=
• X |= Y: (read left to right) X entails Y
• X |= Y: (read right to left) Y follows from X, or Y is a semantic consequence of X
• pq |= pq
• pq does not entail pq and, therefore, pq is not equivalent to pq

• p q pq pq
T T T T
T F T F
F T T F
F F F F

• if X |= Y and Y |= X, then X <=> Y
• DeMorgan's Laws
• ¬(A ∨ B) ⇔ ¬A ∧ ¬B
• ¬(A ∧ B) ⇔ ¬A ∨ ¬B

## Fixpoints

• fixpoint of a function is a point that is mapped to itself by the function
• f(x) = x
• 2 is fixpoint for f(x) = x2 - 3x + 4 because f(2) = 2
• sometimes there is more than one fixpoint, least fixpoint and greatest fixpoint
• e.g., in f(x) = x2,
• x = 0 is the least fixpoint, and
• x = 1 is the greatest fixpoint
• not all functions have fixpoints, e.g., f(x) = x+1
• calculator example: square root function, Newton's method

## Newton's method

• Newton's method of approximate of successive approximations which says that whenever we have a guess y for the value of the square root of x, we can perform a simple manipulation to get a better guess (one closer to the actual square root) by averaging y with x/y
• e.g., we can compute the square root of 2 as follows (suppose the initial guess is 1):
```guess    quotient            average
-------------------------------------------------------
1        2/1 = 2             (2+1)/2 = 1.5
1.5      2/1.5 = 1.3333      (1.3333+1.5)/2 = 1.4167
1.4167   2/1.4167 = 1.4118   (1.4167+1.4118)/2 = 1.4142
1.4142   ...                 ...
...      ...                 ...
-------------------------------------------------------
```
• continuing this process, we progressively obtain more accurate approximations to the square root

## Closure of a set of FD's

(courtesy [DBSC] Fig. 7.8 [p. 280])
F+ = F
repeat
for each FD f ∈ F+
apply reflexivity and augmentation rules on f
add the resulting FD's to F+
for each pair of FD's f1 and f2 ∈ F+
if f1 and f2 can be combined using transitivity
add the resulting FD to F+
until F+ does not change any further

## When does one set of FD's S follow from another T (i.e., T |= S)?

if every relation instance which satisfies all FD's in T also satisfies all the FD's in S

## When are two sets of FD's S and T equivalent (i.e., T ⇔ S)?

• if S+ = T+
• iff S follows from T and T follows from S; T |= S and S |= T; S ⇔ T
• if the set of relation instances satisfying S is exactly the same as the set of relation instances satisfying T

## Closure of a set of attributes

• closure of a set of attributes A under the set of FD's S is the set of attributes B such that every relation which satisfies all of the FD's in S also satisfies A → B [FCDB]
• in other words, A → B `follows from' S (we can also say S |= A → B)
• closure of a set of attributes {A1, A2, ..., An} is denoted {A1, A2, ..., An}+
• X+F = {A | X → A ∈ F+}
• X+F = {A | F+ |= X → A}
• algorithm (courtesy [DBSC] Fig. 7.9 [p. 281])

F given

result = A

while (changes to result)
for each FD X → Y ∈ F do
if X ⊆ result
result = result ∪ Y
• another approach
2. identify FD's A → B where AX, but BX
4. repeat until no more attributes can be added to the closure, or, in other words, when you reach a fixpoint
• what is the running time of the attribute closure algorithm?

## Simple exercise

• R(A, B, C, D, E, F), S = {A B → C, B C → A D, D → E, C F → B}
• compute {AB}+
• {AB}+ = {ABCDE}
• means S |= ABCDE

## Basic properties

• when is {A1, A2, ..., A2}+ the set of all attributes in a relation?
• a set of attributes A is a superkey for R iff {A}+ = R
• a set of attributes A is a key for R iff {A}+ = R and no subset X of A exists where X+ = R

## Uses of the attribute closure algorithm?

(see [DBSC] p. 282)
• determine if a set of attributes X is a superkey; X+ = {all attributes in R}
• can check if an FD A → B holds in a relation (i.e., S |= AB)?
1. compute {A}+
2. check if B ∈ {A}+
3. if so, A → B
• examples
• [DBAA] example 6.4.3 (pp. 205-206)
• does A B → D follow from S? approach, compute {AB}+
• does D → E follow from S? approach, compute {D}+
• can infer all FD's which follow from a given set of FD's; an alternative to F+
for each Δ ⊆ R
compute Δ+
for each S ⊆ of Δ+
output FD Δ → S

• this is an exponential algorithm (in the number of attributes)
• is it NP-complete?
• optimizations
• the empty set and the set containing all attributes will never lead to any nontrivial FD's
• once we determine that a set of attributes S is a superkey, we need not compute the closure of any supersets of S because they will never lead to any new nontrivial FD's

## Example

• consider the relation R(A, B, C, D)
• derive all FD's which follow from S = {A B → C, C → D, D → A}
• look at all subsets of {A,B,C,D} and see which lead to new FD's
• all singletons
```{A}+ = {A}
{B}+ = {B}
{C}+ = {A,C,D}
{D}+ = {A,D}
```
new FD: C → A
• all pairs
```{A,B}+ = {A,B,C,D}
{A,C}+ = {A,C,D}
{A,D}+ = {A,D}
{B,C}+ = {A,B,C,D}
{B,D}+ = {A,B,C,D}
{C,D}+ = {A,C,D}
```
new FD's: A BD, B CA B D → C
• all triples
```{A,B,C}+ = {A,B,C,D}
{A,B,D}+ = {A,B,C,D}
{A,C,D}+ = {A,C,D}
{B,C,D}+ = {A,B,C,D}
```
no new FD's?
• all subsets of size 4: no need to look at them
• so the complete set of completely nontrivial FD's is {A BC, CD, DA, CA, A BD, B CA, B DC}
• we need not have computed the closure of sets {ABC}, {ABD}, and {BCD}

## Basis set of FD's

• a set of FD's F is a basis set of FD's iff F+ = {all FD's of the relation}
• a basis set of FD's f is a minimal basis iff no proper subset of it is also a basis
• a relation may have several minimal bases

## What is a minimal basis (canonical cover) Fc?

Fc is a minimal basis iff F |= Fc and Fc |= F, and Fc has no extraneous attributes, and all lhs are unique

## Concept of extraneousness

two ways to be extraneous
• extraneous FD, e.g., A → C is an extraneous FD in the set {A → B, B → C, A → C}
• extraneous attribute (on either side of an FD)

AB → C, A → C (B is extraneous in the first FD)
AB → CD, A → C (is any attribute extraneous?)

given X → Y ∈ F

A is extraneous if A ∈ X and F |= (F-(X → Y)) ∪ {(X-A) → Y}

I=X-A

check if F |= I → Y

if Y ⊆ IF+, then A is extraneous

or

A is extraneous if A ∈ Y and (F-(X → Y)) ∪ {(X → (Y-A))} |= F

F' = (F-(X → Y)) ∪ {X → (Y-A)}

F' |= X → A

if A ⊆ XF'+, then A is extraneous

example: F = {AB → CD, A → E, E → C}, is C extraneous in the first FD?

ABF'+ = {AB → D, A → E, E → C} = {ABCDE}

## Canonical cover algorithm

Fc = F
repeat
use join rule to combine X → Y, X → Z to X → YZ

find an FD in Fc with an extraneous attribute in either X or Y and delete it from X → Y
until you reach a fixpoint for Fc

## Canonical cover example

A → BC
B → C
A → B
AB → C

(C is extraneous in the first FD and A is extraneous in the last FD)

minimal basis: {A → B, B → C}

a minimal basis does not always have the smallest number of FD's, e.g., which of the following two sets of FD's is a minimal basis? {A → B, B → C} or {A → C}

remember a minimal basis must be a basis first

## Soundness and completeness of algorithms

• sound: finds no false positives (returns no wrong answers)
• complete: finds all true positives (returns all right answers)
• ideally we want both
• means Armstrong's axioms are sound and complete

## FD's in projected relations

• what is projection?
• what FD's hold in the projected relation?
• example 5 (courtesy [FCDB] example 3.23, pp. 99-100): R(A, B, C, D) with S = {AB, BC, CD}, project to S(A, C, D), take closure of all subsets, add FD X → E for each attribute E that is in X+ and S, but not R

## References

 [DBAA] M. Kifer, A. Bernstein, and P. M. Lewis. Database Systems: An Application-Oriented Approach. Addison-Wesley, Boston, MA, Second edition, 2006. [DBSC] A. Silberschatz, H.F. Korth, and S. Sudarshan. Database Systems Concepts. McGraw Hill, Boston, MA, Fifth edition, 2006. [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.