CPS 432/562 Project #1

Coverage: [DBCB] Chapter 16, §16.6, pp. 847-858
Assigned: February 8
Due: February 22, 4:30p, in class


Assume that you are given n relations: R1, R2, ..., Rn. Each of the Ri's has only two attributes. The second attribute of a relation is the same as the first attribute of the next relation. For instance, R1 could be R1(X,Y), then R2 would be R2(Y,Z), R3 is R3(Z,L) and so on. Thus, the total number of distinct attributes in all the relations is n+1.

Write a program (in any programming language) which determines an optimal join strategy for computing R1 R2 ... Rn. Assume that all relations are of size 1000. The program will accept from standard input the value count information, e.g., V(R,a), for only the common attributes. The number of Vi's needed is 2(n-2) + 2. For example, one invocation of your program could be:
$ order_joins
200 100 500 20
200 100 500 20 50 1000
The first test case attempts to do a join of three relations: R(X,Y), S(Y,Z), T(Z,L), where T(R) = T(S) = T(Y) = 1000, and V(R,Y) = 200, V(S,Y) = 100, V(S,Z) = 500, and V(T,Z) = 20. Restrict your answer to only left-deep joins. Your program should output two groupings:
  • The grouping obtained by a dynamic programming search
  • The grouping obtained by a greedy search
In each case, output your answer using parentheses to indicate order. For example, for the above run, the answer should be:
Dynamic Programming Ordering: ((R2 join R3) join R1)
Greedy Ordering: ((R2 join R3) join R1)
Dynamic Programming Ordering: (((R3 join R4) join R2) join R1)
Greedy Ordering: (((R3 join R4) join R2) join R1)
In this case, the results are the same, but you cannot generally assume that they will be identical.

We will not provide any test data nor tell you how many n's will appear. Feel free to share test cases. You may have some surprises when you go from three relations to four relations.

Requirements

  • Your program must implement the dynamic programming and greedy algorithms presented in class.
  • Your program must read from standard input and write to standard output. Do not open any files in your program.
  • Prepare your program as a single source code file.
  • Include comments in your code to identify which program segments implement which algorithm.
  • Follow the CPS 445 programming style guide (which is tailored to C, but generally applicable to any language).
  • E-mail only your source code file as p1.(use an extension appropriate for the programming language used) to your instructor by 4:30p.
  • Turn in a pretty-printed hard copy of your source code (preferably using a2ps) in-class at 4:30p.
  • The hard-copy of your source code that you submit must have be generated from the source code file which you e-mail to the instructor.
Any student who submits without following these requirements will be assessed a 10% penalty. 90% of your score will come from correctness and 10% from following the programming style guide. Applicable submission/late penalties will then be assessed. No credit will be given to i) a program which does not compile without warnings/errors, ii) a program that produces a run-time error (e.g., core dump or segmentation fault), or iii) a program which fails to terminate normally.

Hint

Use of an appropriate programming language can simply the implementation of these algorithms. I advise you to use a language with good capabilities for list/matrix processing (e.g., MATLAB, Lisp, or Python). You also may want to use an interpreted language, rather than one which is compiled, to help deal with debugging, testing, and incremental development.

Return Home