## CPS 432/562 Homework #2 Solution Sketches

1. (15+15=30 points) Exercise 16.6.1 (parts a and b) on p. 858 from [DBCB]

 {W,X} {W,Y} {W,Z} {X,Y} {X,Z} {Y,Z} Size 334 30,000 40,000 600 80,000 2,400 Cost 0 0 0 0 0 0 Best Plan W X W Y W Z X Y X Z Y Z

 {W,X,Y} {W,X,Z} {W,Y,Z} {X,Y,Z} {W,X,Y,Z} Size 1,000 133,334 240,000 4,800 8,000 Cost 334 334 2,400 600 1,334 Best Plan (W X) Y (W X) Z (Y Z) W (X Y) Z (W X) Y) Z

2. (5+5=10 points) Exercise 16.6.5 (part b) on p. 859 from [DBCB]. How many trees are there for the join of eight relations? 17,297,280. How many of these are neither left-deep not right-deep? 17,216,640.

3. (10 points) Show clearly that, with n relations, there are (2(n-1))!/(n-1)! different join orders. Use the fact that the number of different complete binary trees with n leaf nodes is C(2(n-1), n-1)/n, where C(n,r) is the number of r-combinations of a set with n elements, i.e., `n choose r.' A complete binary tree is one where every internal node has exactly two children.

First we must realize that the the number of different complete binary trees with n leaf nodes times the number of ways to permute n relations (i.e., n!) equals the number of bushy and left- and right-deep join trees with n relations.

So we must simply show that the number of different complete binary trees with n leaf nodes times n! equals the number of bushy and left- and right-deep join trees with n relations or (C(2(n-1), n-1)/n)n! = (2(n-1))!/(n-1)!.

Since C(n,r) = n!/r!(n-r)!, the number of different complete binary trees with n leaf nodes equals (2(n-1))!/(n-1)!(n-1)!. This expression multiplied by 1/n results in (2(n-1))!/n!(n-1)!. This expression multiplied by n! equals (2(n-1))!/(n-1)!, the expression for the number of different join orders with n relations.

4. (2+2+6=10 points) Required only for CPS 562 students
Exercises 5.2.10 on p. 213 and 16.4.3 on p. 835 from [DBCB]: The semijoin of relations R and S, is the bag of tuples in R(A,B) that agree with at least one tuple in S(B,C) on all join attributes. An expression in relational algebra that is equivalent to `R semijoin S' is (R S), where L represents the set of all attributes in R. Give two additional expressions in relational algebra which are equivalent to `R semijoin S.' Each of these additional expressions must be truly different and not just a rewrite of the given expression or of each other. Now explain how you would estimate the size of a semijoin, using the ideas presented in class for estimating the size of natural joins?
Notice that (R S) only works when R and S are sets. The following expressions work when they are bags:
• R ((S)), where M represents the set of attributes common to R and S.
• R - (R - (R S))
Let us now attempt to estimate the size of the semijoin. There are several ways to estimate the size, depending on the actual formula used and on the assumptions you make. Here is one analysis. First, notice is that unlike join, semijoin is asymmetric. Thus, we need to be careful on how we do the estimates.

Consider the case when V(R,B) <= V(S,B). We assume that the containment and preservation constraints on value sets is satisfied. Thus, each tuple in R will match some tuple in S on the B attributes. The semijoin would thus be R (itself).

If, on the other hand, V(R,B) >= V(S,B), then not all of the tuples of R would be included in the semijoin (only those that match with S on B). Since there are V(S,B) unique values of B in S, only those tuples of R that have these values will take part in the semijoin. However we have no idea how many tuples of R have these desired values. A quick fix is to use the estimate V(R,A)*V(S,B).

The logical step now is to cast these two answers as special cases of one bigger formula. That sounds nasty until we realize that the first case can be modeled as a duplicate-elimination. Our solution is to use: V(R,A)*min(V(R,B), V(S,B)). If this turns out to be greater than the actual size of R, then use R instead. This problem is endemic to other estimation formulas we have studied. However, such they work really well in the average case.

We must realize that these are only estimates and there is no one correct answer. However, what we do not want to see is a conclusion that the size of the semijoin is the same as the size of the join.