CPS 432/562 Homework #2 Solution Sketches
- (15+15=30 points) Exercise 16.6.1 (parts a and b) on p. 858
from [DBCB]
|
{W,X} |
{W,Y} |
{W,Z} |
{X,Y} |
{X,Z} |
{Y,Z} |
| Size |
334 |
30,000 |
40,000 |
600 |
80,000 |
2,400 |
| Cost |
0 |
0 |
0 |
0 |
0 |
0 |
| Best Plan |
W X |
W Y |
W Z |
X Y |
X Z |
Y Z |
|
{W,X,Y} |
{W,X,Z} |
{W,Y,Z} |
{X,Y,Z} |
{W,X,Y,Z} |
| Size |
1,000 |
133,334 |
240,000 |
4,800 |
8,000 |
| Cost |
334 |
334 |
2,400 |
600 |
1,334 |
| Best Plan |
(W X) Y |
(W X) Z |
(Y Z) W |
(X Y) Z |
(W X) Y) Z |
- (5+5=10 points)
Exercise 16.6.5 (part b) on p. 859 from [DBCB]. How many trees are there
for the join of eight relations? 17,297,280.
How many of these are neither left-deep
not right-deep? 17,216,640.
- (10 points) Show clearly that, with n relations, there are
(2(n-1))!/(n-1)! different join orders. Use the fact that the
number of different complete binary trees with n leaf nodes is
C(2(n-1), n-1)/n, where
C(n,r) is the number of
r-combinations of a set with n elements, i.e., `n choose r.' A
complete binary tree is one where every internal node has exactly two
children.
First we must realize that the
the number of different complete binary trees with n leaf nodes times
the number of ways to permute n relations (i.e., n!)
equals the number of bushy and left- and right-deep join trees with n
relations.
So we must simply show that the number of different complete binary trees with
n leaf nodes times n! equals the number of bushy and left- and
right-deep join trees with n relations or (C(2(n-1),
n-1)/n)n! = (2(n-1))!/(n-1)!.
Since C(n,r) = n!/r!(n-r)!,
the number of different complete binary trees with n leaf nodes equals
(2(n-1))!/(n-1)!(n-1)!. This expression multiplied
by 1/n results in (2(n-1))!/n!(n-1)!.
This expression multiplied by n! equals
(2(n-1))!/(n-1)!, the
expression for the number of different join orders with n relations.
- (2+2+6=10 points) Required only for CPS 562 students
Exercises 5.2.10 on p. 213 and 16.4.3 on p. 835
from [DBCB]:
The semijoin of relations R and S,
is the bag of tuples in R(A,B) that
agree with at least one tuple in S(B,C)
on all join attributes. An expression
in relational algebra that is equivalent
to `R semijoin S' is
(R
S), where L
represents the set of all attributes in R. Give two additional
expressions in relational algebra which are equivalent to
`R semijoin S.' Each of
these additional expressions must be truly
different and not just a rewrite of the given expression or of each other.
Now
explain how you would estimate the size of a semijoin,
using the ideas presented in class for estimating the size of natural joins?
Notice that
(R
S) only works when R and S are sets.
The following expressions work when they are bags:
- R
( (S)),
where M represents the set of attributes common to R and S.
-
R - (R -
(R
S))
Let us now attempt to estimate the size of the semijoin.
There are several
ways to estimate the size,
depending on the actual formula used and on the
assumptions you make. Here is one analysis. First, notice is that
unlike join, semijoin is asymmetric. Thus, we need to be careful on how
we do the estimates.
Consider the case when V(R,B) <=
V(S,B). We assume that
the containment and preservation constraints on value sets is satisfied. Thus,
each tuple in R will match some tuple
in S on the B attributes. The semijoin
would thus be R (itself).
If, on the other hand, V(R,B) >= V(S,B), then not all of the
tuples of R would be included in the semijoin
(only those that match
with S on B). Since there are V(S,B) unique values of
B in S, only those tuples of R that have these values will
take part in the semijoin. However we have no idea how many tuples of
R have these desired values. A quick fix is to use the estimate
V(R,A)*V(S,B).
The logical step now is to cast these two
answers as special cases of one bigger formula. That sounds nasty until we
realize
that the first case can be modeled as a duplicate-elimination.
Our solution is to use: V(R,A)*min(V(R,B), V(S,B)).
If this turns out to
be greater than the actual size of R, then use R
instead. This problem is
endemic to other estimation formulas we have studied.
However, such they work really well in the
average case.
We must realize that these are only estimates and there is no one correct
answer. However, what we do not want to see is a conclusion that the size of
the semijoin is the same as the size of the join.
|