CPS 430/542 Lecture notes: Extended Operators



Coverage: [FCDB] §§5.3-5.4 (pp. 214-230), 6.1 (pp. 251-252), and 6.4 (pp. 277-285)


Relational operations on bags

    ∪, π, σ, ×, |><| no surprises

    ∩ min(m,n)

    - max(0, n-m) think of tuples canceling each other out


Set vs. bag semantics in SQL

  • SELECT-FROM-WHERE construct uses bag semantics by default
  • insert DISTINCT between SELECT and attribute to force set semantics, e.g.,
    SELECT DISTINCT name
    FROM Students
    WHERE GPA > 3.0
    
  • UNION, INTERSECT, and MINUS use set semantics by default
    • bags converted to sets and then set operator is applied
    • add ALL after (UNION | INTERSECT | MINUS) to force bag semantics, e.g.,
      (SELECT name FROM Students) UNION ALL (SELECT name FROM Faculty)
      


Other things to know

  • which SQL operator has higher precedence: AND or OR?
  • how can we alter precedence these operators in a query?
  • INTERSECT and MINUS not implemented in mySQL; need to use a subquery (use postgreSQL as an alternative where AS keyword is required)
  • see Ullman's notes on differences between mySQL and standard SQL


Relational algebra toolkit

(listed in decreasing order of abstraction/expressivity)
    ×, |><|C, |><|

    π, σ, ρ, :=

    ∪, ∩, -,


Extended relational algebra (aggregation and grouping)

  • δ is the duplicate elimination operator (= SELECT DISTINCT)
  • γ is the grouping operator
    • example (courtesy [FCDB] example 5.23, p. 225): consider relation schema StarsIn(title, year, starName) and query for each star who has appeared in a least three movies, the earliest year in which they appeared:

      γstarName, MIN(year) → minYear, COUNT(title) → ctTitle (StarsIn)
    • δ is a special case of γ. why?
    • can view γ as an extension of the projection operator π on sets. why?
    • γ sometime referred to as generalized projection
  • extended projection
  • τ is sort operator (results in a list, rather than a set)
  • equijoin (!= theta-join) example


Final relational algebra toolkit

(listed in decreasing order of abstraction/expressivity)
    τ

    δ, γ

    ×, |><|C, |><|

    π, σ, ρ, :=

    ∪, ∩, -,


Quotient operator

    1 2
    3 4
    
    ×
    
    a b 
    c d
    
    =
    
    1 2 a b
    1 2 c d
    3 4 a b
    3 4 c d
    
    div
    
    1 2
    3 4
    
    =
    
    a b
    c d
    
    --
    
    1 2 a b
    1 2 c d
    3 4 a b
    
    div
    
    1 2
    3 4
    
    =
    
    (a b) remainder (1 2 c d)
    
  • when is quotient useful?
    • often necessary when a query uses the term every
    • consider finding all students who have fulfilled all of the graduation requirements
    • consider Suppliers(sid), Parts(pid), and Catalog(sid, pid); find the sids of all supplier who supply every product, i.e., compute Catalog(sid,pid) / Parts(pid)
  • implement the quotient operator using primitive relational algebra operators (see [FCDB] exercise 5.2.12 on p. 213)


SQL (and RA) aggregation operators

    SUM
    AVG
    MIN
    MAX
    COUNT
    
    COUNT(*)
    COUNT(DISTINCT *)
    


Grouping in SQL

    GROUP BY
    HAVING (a SELECT over the groups) 
    
    usually group for subsequent aggregation (through grouping is not always used in aggregations)
    • grouping with aggregation (duplicates pruned)
    • grouping without aggregation (duplicates retained


Ordering the output

    SELECT * 
    FROM Students
    ORDER BY id;
    
    default ordering is ASCending

    use the keyword DESC for descending order

    SELECT *
    FROM Students
    ORDER BY gpa DESC;
    
    SELECT name, SUM(length)
    FROM Movie, MovieExec
    WHERE producerC# = cert#
    GROUP BY name
    HAVING MIN(year) < 1980
    ORDERBY name
    ORDERBY SUM(length), DESC name;
    


SQL order of execution

(only first and last are required)
(HAVING illegal without a GROUP BY)
    FROM
    WHERE
    GROUP BY
    HAVING
    ORDER BY
    SELECT
    


Writing queries

    SELECT
    FROM
    WHERE
    GROUP BY
    HAVING
    ORDER BY
    


[FCDB] example 5.23 in SQL

    SELECT starName, MIN(year)
    FROM StarsIn
    GROUP BY starName
    HAVING COUNT(starName) >= 3;
    


SQL = DDL + DML


References

    [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.

Return Home