CPS 430/542 Lecture notes: Introduction to
Database Systems
Coverage: [FCDB] Chapter 1 (pp. 1-21)
Introduction to CPS 430/542
logistics, evaluation, and policies
What is a database (DB)?
- `a collection of data that exists
over a long period of time, often many years' [FCDB] p. 2
- managed through a database management system
What is a database management system (DBMS)?
- or simply `database system'
- `a powerful tool for creating and managing
[and manipulating] large amounts of
data [(several
gigabytes; 109
bytes)] efficiently and allowing it to
persist over long periods of
time, safely [(ACID)]' [FCDB] p. 1 (sound like something familiar?)
- focus on secondary, rather than main, memory
- powerful, but simple, programming interface
Why do we need database systems?
- data management problem
- think of analogies from your
personal information space
DBMS vs. `just a file system'
- DBMS's evolved from file systems
- file systems also store large amounts
of data over a long period of time in secondary memory
- however, file systems
- can lack efficient access
- have no direct support for queries
- limit organization to directory creation and hierarchical
organization
- have no sophisticated support for concurrency
- do not ensure durability
ACID properties
(all good DBMS's should guarantee these)
- Atomicity
- should not be able to
execute half of an operation
- either all or none of the effects of a
transaction are made permanent
- Consistency
- there should be no surprises in the world,
e.g., gpa > 4.0, balance < 0,
cats should never have more than 1 tail!
- the effect of concurrent transactions is
equivalent to some serial execution
- use constraints, triggers, active DB elements
(context-free)
- Isolation
- concurrency control
- transactions should not be able to observe
the partial effects of other transactions
- use locks (whole relations or individual tuples?)
- Durability
- if power goes out, nothing bad should happen
- once accepted, the effects of a
transaction are permanent (until, of course,
changed by another transaction)
- use logs
Who uses DBMS's?
- parametric users,
e.g., the travel agent, the application programmer
- create, update, or query content
- use DML (Data Manipulation Language) or query language:
language which changes the instance (contents)
- database administrators (DBA)
- create or modify schema
- uses DDL (Data Definition Language):
language which programs the schema (structure)
Applications of database systems
(shifts in application domains help illustrate evolution of DBMS's)
- reservation systems, banking systems
- record/book keeping (corporate, university, medical), statistics
- bioinformatics, e.g., gene databases
- criminal justice
- fingerprint matching
- how do you encode `looks like'?
- multimedia systems
- require terabytes (1012
bytes) of storage
- tertiary storage devices, e.g., CD, DVDs
- image/audio/video retrieval
- streaming, interactivity
- satellite imaging;
can require petabytes (1015
bytes) of storage
- the web
- client-server and multi-tier architectures
- almost all data-intensive websites are database-driven;
IMDB.com is an exception
- information integration
- over the web
- legacy systems; must deal with issues of
- synonymy: different words having the same
meaning, e.g., coffee shop vs. café
- polysemy: same word (homonym) having different
meanings,
e.g., shot
- data warehouses
- data mining (KDD, Knowledge Discovery in Databases), e.g.,
association rules: `diapers → beer';
we pass these on to the marketing folks
- in sum, databases are everywhere!
Three classical data models
- hierarchical model
- network model
- each tuple is a separate record,
e.g., (AN 145) --- [Perugini] --- (CPS 430)
- no separation between logical and physical views
- used record-at-a-time languages
- too low-level
- relational model
- proposed by E.F. Codd in 1969
- most popular and successful model
- de facto standard for databases
- (relational) databases are one of the most
popular success stories of simple theoretical ideas
- the focus of CPS 430/542
...
semistructured data and XML
- semistructured data is self-describing
- web data tends to be semistructured
- in between structured and unstructured data (free text)
- the study of the storage and retrieval of unstructured
data is called IR (Information Retrieval)
Main themes of
relational database management systems (RDBMS's)
clean separation between logical and physical views,
ANSI Sparc architecture,
3-tier organization of databases (layers of abstraction)
| views |
| ↑ |
| relations |
| ↑ |
| physical storage |
gives rise to powerful, yet declarative, relation-at-a-time
query languages, e.g., SQL (Structured Query Language;
pronounced `sequel')
a simple SQL query illustrating the SELECT-FROM-WHERE construct
SELECT id
FROM Students
WHERE major = 'CPS' AND GPA > 3.7;
relational query languages (QLs) are declarative
- you specify what you want,
not how to get it (à la PROLOG)
- e.g., SQL
closure property
Object-relational Databases
- relational model not always the best fit
- user might need to supply complex data operations,
e.g., for multimedia data;
- object-orientation has been used to
help support such extensions
- led to object-relational systems
How can we study database systems?
- design of databases, i.e.,
how do you structure your data in a database?
- entity-relationship (E/R) model
- relational model
- database programming
- how do you use a DBMS?
- study of DDLs and DMLs (query languages) such SQL and OQL
(Object Query Language);
SQL is both, i.e., SQL = DDL + DML
- database system implementation, i.e.,
how do you build the next Oracle?
CPS 430/542 focuses on (1) and (2).
CPS 432/562 focuses on (3).
Course objectives
- Establish an understanding of database principles and the technologies
and theory underlying database management systems.
- Establish an understanding of data models
(with an emphasis on the relational model),
physical data organization, data design, normalization, and querying.
- Establish an understanding of how databases interface with the web.
Course outline
References
| [FCDB] |
J.D. Ullman and J. Widom. A First Course in Database Systems.
Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.
|
|