CPS 430/542 Lecture notes: Joins



Coverage: [FCDB] §§5.2 (pp. 191-214), 6.2 (pp. 254-264), and 10.2 (pp. 471-480)

Legend: RA = relational algebra, DL = Datalog

assumes set semantics


Overview

  • cartesian product (aka cross-join): ×
  • theta-join: |><|C
  • natural join: |><|
  • outerjoin:
    • full outerjoin: |>o<|
    • left outerjoin: |>o<|L
    • right outerjoin: |>o<|R
  • join tuples
  • (left and/or right) dangling tuples
  • concept of NULL (⊥)


Cartesian product

  • use it whenever you want to pair items

  • RA: Actors × Actresses (n*m tuples)
  • SQL:
    SELECT *
    FROM Actors, Actresses;
    
  • DL: Pairs(x,y,z,a,b,c) ← Actor(x,y,z) AND Actress(a,b,c)

  • sometimes requires renaming
  • RA: Students × Students (n*n tuples)
  • RA: πi,id ((ρS(i,n) Students) × Students)
  • use AS keyword in SQL
  • SQL:
    SELECT S1.name, S2.name
    FROM Students AS S1, Students AS S2;
    
  • DL: Students2(s1,n1,s2,n2) ← Students(s1,n1) AND Students(s2,n2)

  • how can we prune out duplicates?


Theta-join

  • Cartesian product followed by a selection, i.e., R |><|C S = σC (R × S)
  • same number of columns as Cartesian product, but different number of rows (≤ n*m)

  • pair up distinct actors
  • RA:
    A1 := Actor
    A2 := Actor
    πA1.name, A2.name (A1 |><|A1.name ≠ A2.name A2)
  • (AS can be omitted as shown below)
  • SQL (use dot to disambiguate names):
    SELECT A1.name, A2.name
    FROM Actor A1, Actor A2
    WHERE A1.name <> A2.name;
    
  • DL: Answer(a1,a2) ← Actor(a1,_) AND Actor(a2,_) AND A1 ≠ A2

  • how can we prune out duplicate pairs?


Find names of all students with gpa > 3.0

    
    
  • RA: πname (Students |><|Students.id = Grades.id AND gpa > 3.0 Grades)
  • SQL:
    SELECT name
    FROM Students, Grades
    WHERE Students.id = Grades.id AND gpa > 3.0;
    
  • DL: HighGPA(n) ← Students(s,n) AND Grades(s,g) AND g > 3.0


Natural join

  • R |><| S = πLC(R × S))
  • Cartesian product, followed by a select on tuples which agree on the common attributes, followed projecting out one copy of each common attribute

  • find names of all students with gpa > 3.0
  • RA: πnamegpa > 3.0 (Students |><| GPA))
  • SQL: same as above
  • DL: same as above


References

    [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.

Return Home