CPS 432/562 Project #2
Coverage: [DBCB] Chapter 20, §20.6, pp. 1089-1096
Assigned: April 12
Due: April 26, 4:30p, in class
Write a program to mine the complete set of non-trivial
positive-path web FDs satisfied by a relational dataset.
Use your program to mine web FDs
from the dataset available here.
You may use any programming language.
Requirements
- Make the confidence threshold a command-line argument to your
program.
- Your program must read from standard input and write to standard output.
Do not open any files in your program.
- Each web FD should be output one per line
with no more than one term on the rhs. Use -> with a single
space character on each side to separate the
lhs from the rhs of every FD. When the lhs of an FD
contains more than one term,
delimit terms with a comma followed by a single space.
- Follow the CPS 445 programming style guide (which is tailored to C,
but generally applicable to any language).
- E-mail only your source code file as p2.(use an extension appropriate
for the programming language used) or p2.(tar or zip),
if you have multiple source files,
to your instructor by 4:30p.
- Turn in a pretty-printed hard copy of your source code listing (preferably
using a2ps) in-class at 4:30p.
- The hard-copy of your source code that you submit must have be generated
from the source code file which you e-mail to the
instructor.
Any student who submits without following these requirements will be assessed a
10% penalty. 90% of your score will come
from correctness and 10% from
following the programming style guide.
Applicable
submission/late penalties will then be assessed.
No credit will be given to i) a program which does not compile without
warnings/errors, ii) a program that produces a run-time error (e.g., core dump
or segmentation fault), or iii) a program which fails to terminate normally.
Hint
When developing your program, start by working with the sample automobiles
dataset given in the paper (ref. Fig. 1). This simple dataset contains 22
sequences and 13 positive-path web FDs.
In addition,
use of an appropriate programming language can simply the implementation
of this algorithm. I advise you to use a language with good list-processing
capabilities
(e.g., MATLAB,
Lisp, or
Python). You also may want to use an
interpreted language, rather than one which
is compiled, to help deal
with debugging, testing, and incremental development.
|