CPS 430/542 Project #2 Addendum

Coverage: [FCDB] Chapters 6 (particularly §§6.5-6.7) and 8 (particularly §§8.1-8.5)
Assigned: November 14
Due: December 5, 4:30pm, in class

(51 points) Explore the interface between the web and database systems.

Requirements

  1. install either mySQL, PostreSQL, Oracle Database 10g (express edition), or SQL Server 2005 (express edition) on your system,
  2. choose a webpage or website which contains relational data,
  3. a write a script (in any language) which scrapes (parses) the HTML page(s), creates the relational database schema, and loads the data directly into the database. Alternatively, implement your script to generate a data file and use a bulk loader facility to automatically load the data into your database (you must load at least 100 tuples into one of your relations), and
  4. do something interesting with the database, e.g., run some compelling queries (beyond those we have seen in class) and/or build a simple web query (keyword search) interface to the database.

You are welcome to work in groups of no more than 3 students on this part of the project. Of course, you can submit this addendum individually or in pairs as well.

The addendum is open-ended by design. You have carte blanche to take the project in any direction you desire and you are encouraged to be creative. This is your chance to do some self-directed work and impress the instructor. You are advised to pick a domain you are particularly passionate about (books, movies, South Park). Full credit will be awarded to only the most compelling and creative projects. You are particularly encouraged to build personalization or recommendation into your system.

Getting help

  1. Read [FCDB] §§8.1-8.5.
  2. See the CPS 444/544 Homework #4 specification for inspiration.
  3. Use curl, wget, or the lynx text-based web browser in a non-interactive fashion (e.g., lynx -dump -width=200 'http://washingtonpost.com') from within your script to download the source of the the webpage(s).
  4. Consult the CPS 444/544 lecture notes on pattern matching, filters (sed, awk), or shell scripting for help on the text processing component.
  5. Use online documentation for Ruby, REXX, PHP, Python, Perl, and so on.
  6. Drop by the instructor's office for help and ideas.

What to turn in

  • a 2-page report which contains:
    1. the URL of the webpage you scrapped,
    2. a short description of your system (min: 250 words, max: 500 words)
    3. the keys, FD's, and normal form for each of your relations, and discussion of why you chose this normal form
  • pretty-printed hard copy of all of your code (script, code to create the tables, queries, and so on)

Also, schedule a demo with your instructor (preferably during office hours).



Return Home