February 6, 2006
Coverage: [DBCB] Chapter 15, pp. 720-723, 730-731, 733-737 and
Chapter 16, pp. 856-858
Enhancements to the DP Algorithm
for Join Ordering
- using the sum of sizes of intermediate relations as a cost
estimate may be naive
- consider R(a,b)
S(b,c)
- R with 1 tuple, S
with an index on the join attribute (b)
- what if S has no index?
- take particular join algorithm into account
A Greedy Approach to
Join Ordering
- dynamic programming approach still involves enumeration
- exponential in the #relations joined
- not practical for joins involving beyond 5 or 6 relations
- rather we can use a search guided by a heuristic
- keep intermediate relations as small as possible at each level
of the join tree
- greedy algorithm
- start with the pair of relations whose estimated size is smallest; let
the join of these relations be the current tree
- search among the remaining relations for that, which when joined
with the current tree, yields a resulting relation of smallest size
- the current tree becomes the lhs of the new join and the selected
relation the rhs
- obviously we are again only considering left-deep trees
- is the dynamic programming algorithm sound? complete? both?
- what about the greedy algorithm?
One-Pass Join Algorithm
- load smaller relation in main memory
- use M-1 blocks for smaller relation
- read each block of the larger relation into the remaining
main memory buffer one at a time
- use the index to find the set of tuples which agree
with the tuple read in
- move each matched tuple to the output
- notes:
- an asymmetric algorithm
- assumes smaller relation fits into main memory
- assumes smaller relation has an index on the join attribute
Iterators for Physical Operators
- idea in iteration is to permit the consumer, of the results of the
operator, to access/read the result one tuple at-a-time
- sometimes referred to as pipelining
- contrast with materialization where result produced as
a whole
- iterator can be thought of as an OO class with three methods
- examples of iterators for sample operators:
- table-scan
- most work done in GetNext
- sort-scan
- bag-union
- interesting instances of iterator-based computation arise
when iterators collaborate to produce a result
- iterators calling other iterators
- build iterator for R
S
from iterators for R and S
Nested-Loop Joins
- applicable to relations of any size, not just those
that fit into main memory
- few variations
- tuple-based nested-loop join
- can easily incorporate an iterator
- block-based nested-loop join
- starts to look like one-pass join
- only join method available in some early RDBMS's
References
[DBCB] H. Garcia-Molina, J. D. Ullman, and J. Widom.
Database Systems: The Complete Book. Prentice Hall, 2002.