CPS 430/542 Lecture notes: Attribute and FD Closure Algorithms and Canonical Cover



Coverage: [FCDB] §§3.4-3.5 (pp. 82-102)


Basic review of logic

  • `if it is raining outside, I carry an umbrella'; what can you conclude about the state of the world if you see me roaming around Miriam hall with an umbrella in my hand?
  • if and only if review (iff), bidirectional, ↔
  • concept of a model
  • concept of entailment in logic, |=
    • X |= Y: (read left to right) X entails Y
    • X |= Y: (read right to left) Y follows from X, or Y is a semantic consequence of X
  • pq |= pq
  • pq does not entail pq and, therefore, pq is not equivalent to pq

  • p q pq pq
    T T T T
    T F T F
    F T T F
    F F F F

  • if X |= Y and Y |= X, then X <=> Y
  • DeMorgan's Laws
    • ¬(A ∨ B) ⇔ ¬A ∧ ¬B
    • ¬(A ∧ B) ⇔ ¬A ∨ ¬B


Fixpoints

  • fixpoint of a function is a point that is mapped to itself by the function
  • f(x) = x
  • 2 is fixpoint for f(x) = x2 - 3x + 4 because f(2) = 2
  • sometimes there is more than one fixpoint, least fixpoint and greatest fixpoint
    • e.g., in f(x) = x2,
    • x = 0 is the least fixpoint, and
    • x = 1 is the greatest fixpoint
  • not all functions have fixpoints, e.g., f(x) = x+1
  • calculator example: square root function, Newton's method


Newton's method

  • Newton's method of approximate of successive approximations which says that whenever we have a guess y for the value of the square root of x, we can perform a simple manipulation to get a better guess (one closer to the actual square root) by averaging y with x/y
  • e.g., we can compute the square root of 2 as follows (suppose the initial guess is 1):
    guess    quotient            average
    -------------------------------------------------------
    1        2/1 = 2             (2+1)/2 = 1.5
    1.5      2/1.5 = 1.3333      (1.3333+1.5)/2 = 1.4167
    1.4167   2/1.4167 = 1.4118   (1.4167+1.4118)/2 = 1.4142
    1.4142   ...                 ...
    ...      ...                 ...
    -------------------------------------------------------
    
  • continuing this process, we progressively obtain more accurate approximations to the square root


Closure of a set of FD's

(courtesy [DBSC] Fig. 7.8 [p. 280])
    F+ = F
    repeat
      for each FD f ∈ F+
        apply reflexivity and augmentation rules on f
        add the resulting FD's to F+
      for each pair of FD's f1 and f2 ∈ F+
        if f1 and f2 can be combined using transitivity
          add the resulting FD to F+
    until F+ does not change any further


When does one set of FD's S follow from another T (i.e., T |= S)?

if every relation instance which satisfies all FD's in T also satisfies all the FD's in S


When are two sets of FD's S and T equivalent (i.e., T ⇔ S)?

  • if S+ = T+
  • iff S follows from T and T follows from S; T |= S and S |= T; S ⇔ T
  • if the set of relation instances satisfying S is exactly the same as the set of relation instances satisfying T


[DBAA] example 6.4.1 (p. 203)


Closure of a set of attributes

  • closure of a set of attributes A under the set of FD's S is the set of attributes B such that every relation which satisfies all of the FD's in S also satisfies A → B [FCDB]
  • in other words, A → B `follows from' S (we can also say S |= A → B)
  • closure of a set of attributes {A1, A2, ..., An} is denoted {A1, A2, ..., An}+
  • X+F = {A | X → A ∈ F+}
  • X+F = {A | F+ |= X → A}
  • algorithm (courtesy [DBSC] Fig. 7.9 [p. 281])

    F given

    result = A

    while (changes to result)
      for each FD X → Y ∈ F do
        if X ⊆ result
          result = result ∪ Y
  • another approach
    1. start with initial set of attributes X
    2. identify FD's A → B where AX, but BX
    3. add B to X
    4. repeat until no more attributes can be added to the closure, or, in other words, when you reach a fixpoint
  • what is the running time of the attribute closure algorithm?


Simple exercise

  • R(A, B, C, D, E, F), S = {A B → C, B C → A D, D → E, C F → B}
  • compute {AB}+
  • {AB}+ = {ABCDE}
  • means S |= ABCDE


[DBAA] example 6.4.2 (p. 204)


Basic properties

  • when is {A1, A2, ..., A2}+ the set of all attributes in a relation?
  • a set of attributes A is a superkey for R iff {A}+ = R
  • a set of attributes A is a key for R iff {A}+ = R and no subset X of A exists where X+ = R


Uses of the attribute closure algorithm?

(see [DBSC] p. 282)
  • determine if a set of attributes X is a superkey; X+ = {all attributes in R}
  • can check if an FD A → B holds in a relation (i.e., S |= AB)?
    1. compute {A}+
    2. check if B ∈ {A}+
    3. if so, A → B
    • examples
      • [DBAA] example 6.4.3 (pp. 205-206)
      • does A B → D follow from S? approach, compute {AB}+
      • does D → E follow from S? approach, compute {D}+
  • can infer all FD's which follow from a given set of FD's; an alternative to F+
      for each Δ ⊆ R
        compute Δ+
        for each S ⊆ of Δ+
          output FD Δ → S

    • this is an exponential algorithm (in the number of attributes)
    • is it NP-complete?
    • optimizations
      • the empty set and the set containing all attributes will never lead to any nontrivial FD's
      • once we determine that a set of attributes S is a superkey, we need not compute the closure of any supersets of S because they will never lead to any new nontrivial FD's


Example

  • consider the relation R(A, B, C, D)
  • derive all FD's which follow from S = {A B → C, C → D, D → A}
  • look at all subsets of {A,B,C,D} and see which lead to new FD's
  • all singletons
      {A}+ = {A}
      {B}+ = {B}
      {C}+ = {A,C,D}
      {D}+ = {A,D}
      
      new FD: C → A
  • all pairs
      {A,B}+ = {A,B,C,D}
      {A,C}+ = {A,C,D}
      {A,D}+ = {A,D}
      {B,C}+ = {A,B,C,D}
      {B,D}+ = {A,B,C,D}
      {C,D}+ = {A,C,D}
      
      new FD's: A BD, B CA B D → C
  • all triples
      {A,B,C}+ = {A,B,C,D}
      {A,B,D}+ = {A,B,C,D}
      {A,C,D}+ = {A,C,D}
      {B,C,D}+ = {A,B,C,D}
      
      no new FD's?
  • all subsets of size 4: no need to look at them
  • so the complete set of completely nontrivial FD's is {A BC, CD, DA, CA, A BD, B CA, B DC}
  • we need not have computed the closure of sets {ABC}, {ABD}, and {BCD}


Basis set of FD's

  • a set of FD's F is a basis set of FD's iff F+ = {all FD's of the relation}
  • a basis set of FD's f is a minimal basis iff no proper subset of it is also a basis
  • a relation may have several minimal bases


Why compute a minimal basis?


What is a minimal basis (canonical cover) Fc?

Fc is a minimal basis iff F |= Fc and Fc |= F, and Fc has no extraneous attributes, and all lhs are unique


Concept of extraneousness

    two ways to be extraneous
    • extraneous FD, e.g., A → C is an extraneous FD in the set {A → B, B → C, A → C}
    • extraneous attribute (on either side of an FD)

    AB → C, A → C (B is extraneous in the first FD)
    AB → CD, A → C (is any attribute extraneous?)

    given X → Y ∈ F

    A is extraneous if A ∈ X and F |= (F-(X → Y)) ∪ {(X-A) → Y}

    I=X-A

    check if F |= I → Y

    if Y ⊆ IF+, then A is extraneous

    or

    A is extraneous if A ∈ Y and (F-(X → Y)) ∪ {(X → (Y-A))} |= F

    F' = (F-(X → Y)) ∪ {X → (Y-A)}

    F' |= X → A

    if A ⊆ XF'+, then A is extraneous

    example: F = {AB → CD, A → E, E → C}, is C extraneous in the first FD?

    ABF'+ = {AB → D, A → E, E → C} = {ABCDE}

    The answer is yes.


Canonical cover algorithm

    Fc = F
    repeat
      use join rule to combine X → Y, X → Z to X → YZ

      find an FD in Fc with an extraneous attribute in either X or Y and delete it from X → Y
    until you reach a fixpoint for Fc


Canonical cover example

    A → BC
    B → C
    A → B
    AB → C

    (C is extraneous in the first FD and A is extraneous in the last FD)

    minimal basis: {A → B, B → C}

    a minimal basis does not always have the smallest number of FD's, e.g., which of the following two sets of FD's is a minimal basis? {A → B, B → C} or {A → C}

    remember a minimal basis must be a basis first


Soundness and completeness of algorithms

  • sound: finds no false positives (returns no wrong answers)
  • complete: finds all true positives (returns all right answers)
  • ideally we want both
  • means Armstrong's axioms are sound and complete


FD's in projected relations

  • what is projection?
  • what FD's hold in the projected relation?
  • example 5 (courtesy [FCDB] example 3.23, pp. 99-100): R(A, B, C, D) with S = {AB, BC, CD}, project to S(A, C, D), take closure of all subsets, add FD X → E for each attribute E that is in X+ and S, but not R


References

    [DBAA] M. Kifer, A. Bernstein, and P. M. Lewis. Database Systems: An Application-Oriented Approach. Addison-Wesley, Boston, MA, Second edition, 2006.
    [DBSC] A. Silberschatz, H.F. Korth, and S. Sudarshan. Database Systems Concepts. McGraw Hill, Boston, MA, Fifth edition, 2006.
    [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.

Return Home