# CPS 430/542 Lecture notes: Joins

Coverage: [FCDB] §§5.2 (pp. 191-214), 6.2 (pp. 254-264), and 10.2 (pp. 471-480)

Legend: RA = relational algebra, DL = Datalog

assumes set semantics

## Overview

• cartesian product (aka cross-join): ×
• theta-join: |><|C
• natural join: |><|
• outerjoin:
• full outerjoin: |>o<|
• left outerjoin: |>o<|L
• right outerjoin: |>o<|R
• join tuples
• (left and/or right) dangling tuples
• concept of NULL (⊥)

## Cartesian product

• use it whenever you want to pair items

• RA: Actors × Actresses (n*m tuples)
• SQL:
```SELECT *
FROM Actors, Actresses;
```
• DL: Pairs(x,y,z,a,b,c) ← Actor(x,y,z) AND Actress(a,b,c)

• sometimes requires renaming
• RA: Students × Students (n*n tuples)
• RA: πi,id ((ρS(i,n) Students) × Students)
• use AS keyword in SQL
• SQL:
```SELECT S1.name, S2.name
FROM Students AS S1, Students AS S2;
```
• DL: Students2(s1,n1,s2,n2) ← Students(s1,n1) AND Students(s2,n2)

• how can we prune out duplicates?

## Theta-join

• Cartesian product followed by a selection, i.e., R |><|C S = σC (R × S)
• same number of columns as Cartesian product, but different number of rows (≤ n*m)

• pair up distinct actors
• RA:
A1 := Actor
A2 := Actor
πA1.name, A2.name (A1 |><|A1.name ≠ A2.name A2)
• (AS can be omitted as shown below)
• SQL (use dot to disambiguate names):
```SELECT A1.name, A2.name
FROM Actor A1, Actor A2
WHERE A1.name <> A2.name;
```
• DL: Answer(a1,a2) ← Actor(a1,_) AND Actor(a2,_) AND A1 ≠ A2

• how can we prune out duplicate pairs?

## Find names of all students with gpa > 3.0

• RA: πname (Students |><|Students.id = Grades.id AND gpa > 3.0 Grades)
• SQL:
```SELECT name
WHERE Students.id = Grades.id AND gpa > 3.0;
```
• DL: HighGPA(n) ← Students(s,n) AND Grades(s,g) AND g > 3.0

## Natural join

• R |><| S = πLC(R × S))
• Cartesian product, followed by a select on tuples which agree on the common attributes, followed projecting out one copy of each common attribute

• find names of all students with gpa > 3.0
• RA: πnamegpa > 3.0 (Students |><| GPA))
• SQL: same as above
• DL: same as above

## References

 [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.