CPS 430/542 Lecture notes: Introduction to Database Systems



Coverage: [FCDB] Chapter 1 (pp. 1-21)


Introduction to CPS 430/542

logistics, evaluation, and policies



What is a database (DB)?

  • `a collection of data that exists over a long period of time, often many years' [FCDB] p. 2
  • managed through a database management system


What is a database management system (DBMS)?

  • or simply `database system'
  • `a powerful tool for creating and managing [and manipulating] large amounts of data [(several gigabytes; 109 bytes)] efficiently and allowing it to persist over long periods of time, safely [(ACID)]' [FCDB] p. 1 (sound like something familiar?)
  • focus on secondary, rather than main, memory
  • powerful, but simple, programming interface


Why do we need database systems?

  • data management problem
  • think of analogies from your personal information space


DBMS vs. `just a file system'

  • DBMS's evolved from file systems
  • file systems also store large amounts of data over a long period of time in secondary memory
  • however, file systems
    • can lack efficient access
    • have no direct support for queries
    • limit organization to directory creation and hierarchical organization
    • have no sophisticated support for concurrency
    • do not ensure durability


ACID properties

(all good DBMS's should guarantee these)
  • Atomicity
    • should not be able to execute half of an operation
    • either all or none of the effects of a transaction are made permanent
  • Consistency
    • there should be no surprises in the world, e.g., gpa > 4.0, balance < 0, cats should never have more than 1 tail!
    • the effect of concurrent transactions is equivalent to some serial execution
    • use constraints, triggers, active DB elements (context-free)
  • Isolation
    • concurrency control
    • transactions should not be able to observe the partial effects of other transactions
    • use locks (whole relations or individual tuples?)
  • Durability
    • if power goes out, nothing bad should happen
    • once accepted, the effects of a transaction are permanent (until, of course, changed by another transaction)
    • use logs


Who uses DBMS's?

  • parametric users, e.g., the travel agent, the application programmer
    • create, update, or query content
    • use DML (Data Manipulation Language) or query language: language which changes the instance (contents)
  • database administrators (DBA)
    • create or modify schema
    • uses DDL (Data Definition Language): language which programs the schema (structure)


Applications of database systems

(shifts in application domains help illustrate evolution of DBMS's)


  • reservation systems, banking systems
  • record/book keeping (corporate, university, medical), statistics
  • bioinformatics, e.g., gene databases
  • criminal justice
    • fingerprint matching
    • how do you encode `looks like'?
  • multimedia systems
    • require terabytes (1012 bytes) of storage
    • tertiary storage devices, e.g., CD, DVDs
    • image/audio/video retrieval
    • streaming, interactivity
  • satellite imaging; can require petabytes (1015 bytes) of storage
  • the web
    • client-server and multi-tier architectures
    • almost all data-intensive websites are database-driven; IMDB.com is an exception
  • information integration
    • over the web
    • legacy systems; must deal with issues of
      • synonymy: different words having the same meaning, e.g., coffee shop vs. café
      • polysemy: same word (homonym) having different meanings, e.g., shot
    • data warehouses
    • data mining (KDD, Knowledge Discovery in Databases), e.g., association rules: `diapers → beer'; we pass these on to the marketing folks
  • in sum, databases are everywhere!


Three classical data models

  1. hierarchical model
  2. network model
    • each tuple is a separate record, e.g., (AN 145) --- [Perugini] --- (CPS 430)
    • no separation between logical and physical views
    • used record-at-a-time languages
    • too low-level
  3. relational model
    • proposed by E.F. Codd in 1969
    • most popular and successful model
    • de facto standard for databases
    • (relational) databases are one of the most popular success stories of simple theoretical ideas
    • the focus of CPS 430/542
  4. ...
    semistructured data and XML
    • semistructured data is self-describing
    • web data tends to be semistructured
    • in between structured and unstructured data (free text)
    • the study of the storage and retrieval of unstructured data is called IR (Information Retrieval)


Main themes of relational database management systems (RDBMS's)

  • data stored in a relation (for now, a table), e.g., a simple relation
      ------------------- |
      ID     GPA    major |} ← attributes
      ------------------- |
      001    3.7    CPS   |} ← tuple or record
      005    3.9    ART   |
      007    4.0    CIS   |
      987    2.0    CPE   |
      ------------------- |
      
    • attributes (columns), tuples (rows, records)
    • 2-tuple = pair, 3-tuple = triple, m-tuple

  • clean separation between logical and physical views, ANSI Sparc architecture, 3-tier organization of databases (layers of abstraction)
    views
    relations
    physical storage

  • gives rise to powerful, yet declarative, relation-at-a-time query languages, e.g., SQL (Structured Query Language; pronounced `sequel')
  • a simple SQL query illustrating the SELECT-FROM-WHERE construct
      SELECT id
      FROM Students
      WHERE major = 'CPS' AND GPA > 3.7;
      
  • relational query languages (QLs) are declarative
    • you specify what you want, not how to get it (à la PROLOG)
    • e.g., SQL

  • closure property


Object-relational Databases

  • relational model not always the best fit
  • user might need to supply complex data operations, e.g., for multimedia data;
    • object-orientation has been used to help support such extensions
    • led to object-relational systems


How can we study database systems?

  1. design of databases, i.e., how do you structure your data in a database?
    • entity-relationship (E/R) model
    • relational model

  2. database programming
    • how do you use a DBMS?
    • study of DDLs and DMLs (query languages) such SQL and OQL (Object Query Language); SQL is both, i.e., SQL = DDL + DML

  3. database system implementation, i.e., how do you build the next Oracle?

  4. CPS 430/542 focuses on (1) and (2).
    CPS 432/562 focuses on (3).


Course objectives

  • Establish an understanding of database principles and the technologies and theory underlying database management systems.
  • Establish an understanding of data models (with an emphasis on the relational model), physical data organization, data design, normalization, and querying.
  • Establish an understanding of how databases interface with the web.


Course outline


References

    [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.

Return Home