CPS 432/562 Project #2

Coverage: [DBCB] Chapter 20, §20.6, pp. 1089-1096
Assigned: April 12
Due: April 26, 4:30p, in class


Write a program to mine the complete set of non-trivial positive-path web FDs satisfied by a relational dataset. Use your program to mine web FDs from the dataset available here. You may use any programming language.

Requirements

  • Make the confidence threshold a command-line argument to your program.
  • Your program must read from standard input and write to standard output. Do not open any files in your program.
  • Each web FD should be output one per line with no more than one term on the rhs. Use -> with a single space character on each side to separate the lhs from the rhs of every FD. When the lhs of an FD contains more than one term, delimit terms with a comma followed by a single space.
  • Follow the CPS 445 programming style guide (which is tailored to C, but generally applicable to any language).
  • E-mail only your source code file as p2.(use an extension appropriate for the programming language used) or p2.(tar or zip), if you have multiple source files, to your instructor by 4:30p.
  • Turn in a pretty-printed hard copy of your source code listing (preferably using a2ps) in-class at 4:30p.
  • The hard-copy of your source code that you submit must have be generated from the source code file which you e-mail to the instructor.
Any student who submits without following these requirements will be assessed a 10% penalty. 90% of your score will come from correctness and 10% from following the programming style guide. Applicable submission/late penalties will then be assessed. No credit will be given to i) a program which does not compile without warnings/errors, ii) a program that produces a run-time error (e.g., core dump or segmentation fault), or iii) a program which fails to terminate normally.

Hint

When developing your program, start by working with the sample automobiles dataset given in the paper (ref. Fig. 1). This simple dataset contains 22 sequences and 13 positive-path web FDs.

In addition, use of an appropriate programming language can simply the implementation of this algorithm. I advise you to use a language with good list-processing capabilities (e.g., MATLAB, Lisp, or Python). You also may want to use an interpreted language, rather than one which is compiled, to help deal with debugging, testing, and incremental development.

Return Home