- typical optimizers perform poorly on queries
involving nested subqueries
- perils of the high degree of expressivity of
SQL (its runaway success) are felt by the query optimizer
- the optimizer's job
(of recognizing
the equivalence of multiple equivalent expressions)
becomes tougher
as the number of constructs the language provides for the user
to write a single query in multiple ways (according the user's
mental model of the computation) increase
-
SELECT M.title
FROM Movies M
WHERE M.rating = (SELECT MAX (M2.rating)
FROM Movies M2);
-
SELECT M.title
FROM Movies M
WHERE M.title IN (SELECT M2.title
FROM Awards A
WHERE A.name='Oscar');
- entails a join of Movies with computed collection of movie
titles
- correlated query (contrast with uncorrelated queries above)
SELECT M.title
FROM Movies M
WHERE EXISTS (SELECT *
FROM Awards A
WHERE A.name='Oscar' AND M.title=A.title);
- correlated because subquery references the variable from the outer query
- subquery must be evaluated more than once
- and if the same value appears in the
correlation field (A.name above)
in several tuples, then the same nested subquery is evaluated
multiple times for that distinct value!
- moral of the story: when possible, always use
an unnested (or uncorrelated) query
- because most optimizers cannot recognize the equivalence
and conduct a transformation
- a nested query often has an analog without nesting (and typically
involves a join)
- a correlated query often has an uncorrelated analog
- another level of nesting, and another, and ..., ad infinitum
- ideas are the same, start with the innermost