CPS 430/542 Project #2 Addendum
Coverage: [FCDB] Chapters 6 (particularly §§6.5-6.7)
and 8 (particularly §§8.1-8.5)
Assigned: November 14
Due: December 5, 4:30pm, in class
(51 points)
Explore the interface between the web and database systems.
Requirements
- install either mySQL,
PostreSQL,
Oracle
Database 10g (express edition), or SQL Server
2005 (express edition) on your system,
- choose a webpage or website which contains relational data,
- a write a script (in any language) which scrapes (parses) the
HTML page(s), creates the relational database schema, and
loads the data directly into the database. Alternatively,
implement your script to
generate a data file and use a bulk loader facility to
automatically load the data into your database (you must load
at least 100 tuples into one of your relations), and
- do something interesting with the database, e.g.,
run some compelling queries (beyond those
we have seen in class) and/or build
a simple web query (keyword search) interface
to the database.
You are welcome to work in groups of no more than 3 students on this part
of the project. Of course, you can submit this addendum individually or in pairs
as well.
The addendum is open-ended by design. You have carte blanche
to take the project in any direction you desire
and you are encouraged to be creative.
This is your chance to do some self-directed
work and impress the instructor. You are advised to pick a domain you are
particularly passionate about (books, movies, South Park). Full credit will
be awarded to only the most compelling and creative projects. You
are particularly encouraged to build personalization or recommendation
into your system.
Getting help
- Read [FCDB] §§8.1-8.5.
- See the CPS 444/544 Homework #4 specification
for inspiration.
- Use curl, wget, or the lynx text-based web browser in a non-interactive
fashion (e.g., lynx -dump -width=200 'http://washingtonpost.com')
from within your script to download the source of the the webpage(s).
- Consult the CPS 444/544 lecture
notes on pattern
matching, filters
(sed,
awk),
or shell
scripting for help on the text processing component.
- Use online documentation for
Ruby,
REXX,
PHP,
Python,
Perl,
and so on.
- Drop by the instructor's office for help and ideas.
What to turn in
- a 2-page report which contains:
- the URL of the webpage you scrapped,
- a short description of your system (min: 250 words, max: 500 words)
- the keys, FD's, and
normal form for each of your relations, and discussion of why you chose
this normal form
- pretty-printed hard copy of all of your code (script, code to
create the tables, queries, and so on)
Also, schedule a demo with your instructor (preferably during office hours).
|