CPS 430/542 Lecture notes: Introduction to the Relational Database Model



Coverage: [FCDB] §§3.1-3.3 (pp. 61-82)


Introduction to the relational data model

  • proposed by Edgar Codd in 1969
  • centered around its primary element: the relation
  • everything is a relation in this model


Why the relational model?

  • clean separation between physical and logic layers; ANSI Sparc architecture
  • lead to powerful and declarative query languages such as SQL
  • runaway success of simple theoretical concepts put into practice
  • why not design using the relational model from the start?
    • E/R good basis for design
    • E/R not good basis for DML
    • since the relational model has only one main element (the relation) in contrast to the E/R model (which has entity sets, relationships, and so on), it is less flexible for design
    • these inflexibilities are best handled after a design has been constructed
    • relational model has its own design theory (normalization)
      • there are many normal forms
      • depend on design goals


Overview


Overview of what we will study

  • E/R (or ODL) → R
  • R → better R (process called normalization)
  • functional dependencies (generalize concept of key)
  • design theory, normal forms


Essential elements of the relational model

  • a relation is a set of tuples
  • theoretically, a relation is a subset of the Cartesian product: RS1 × S2 × ⋅⋅⋅ × Sn
      A = {a, b, c}, B = {d, e} (each set specifies a domain)

      A × B = {(a,d), (a,e), (b,d), (b,e), (c,d), (c,e)} (each element is called a tuple)

      R ⊆ A × B (binary relation)
  • Students relation
                             ------------------- |
                             ID     GPA    major |} ← attributes
                             ------------------- |
        tuple or record → { 001    3.7    CPS   |}
                             005    3.9    ART   |} instance: a set of (legal) tuples for a given relation
                             007    4.0    CIS   |}
                             987    2.0    CPE   |}
                             ------------------- |
    
  • tuple: (001, 3.7, cps)
  • components of a tuple
    • 001
    • 3.7
    • cps
  • domain (of gpa is a floating point number between 0.0 and 4.0)
  • instance: a set of (legal) tuples for a given relation
  • relational schema: Students(id, gpa, major)
  • relational database schema (or simply database schema): set of all relational schema in a database design
  • column order and tuple order is irrelevant
  • a struct in C is a relation, e.g., struct example_relation {int x; float y;}; (Cartesian product of set of all possible values for int and set of all possible values for float)
  • worked out exercises for several of equivalent relations
    • exercise 3.1.2 (all parts) on p. 65 from [FCDB]
    • general form: n! * m!
  • what changes more frequently: instance or schema? what do we use to change each?


From E/R diagrams to relations

    (a relation can represent more than just entity sets)
  1. convert each entity set into a relation; make all attributes of the entity set attributes of the relation
  2. convert each relationship into a relation
    • make the key attributes of the participating entity sets as well as the attributes of the relationship attributes of the relation (renaming attributes as appropriate)
    • what to do with multiple roles?


Some impurities

  • many-one relationships (are source of consolidation)
  • weak entity sets
  • isa relationships


Combining relations

  • many-one relationships
    • relations for the many entity set [E] and the many-one relationship <R> each should have a key for [E] in their relation
    • non-key attributes of [E], key attributes of [F], and attributes of <R>, are uniquely determined by the key of [E]
    • combine
      1. all attributes of [E]
      2. the key attributes of [F]
      3. the attributes of <R>
      (2) and (3) might be NULL if the one in [F] does not exist!
  • more efficient to answer queries involving attributes from one relation than attributes distributed across more than one relation
  • example:


    Players(ssn, name, no)
            ---
    PlaysFor(ssn, tname)
    Teams(name, city)
          ----
    

    Playsfor(ssn, name, no, tname)
             ---
    Teams(name, city)
          ----
    
    why not add city to combined relation as well?
    • risky
    • source of redundancy


Converting weak entity sets

  • incorporate all key attributes of weak entity set [[E]] in relation for E
  • no relation necessary for any supporting relationship unless it has its own attributes


Eliminating relations

  • what is a proper subset? proper superset?
  • subset is necessary, but not sufficient
  • IRS example: People(name, ssn) and TaxPayers(name, ssn, amount)
  • movies example: Stars(name, address) and Studios(name, address)
  • moral of the story: cannot consolidate blindly


ISA relationships

  • why no relation necessary for isa relationship?
  • 3 approaches
    • E/R style: one relation per entity set
    • OO style: one relation per sub-tree (including the root)
      • closed form: 2n, where n is number of nodes, excluding the root
      • there are 4 sub-trees in the movies example
        • can combine the first two, though we loose some info, i.e., which movies are cartoons
        • can combine the last two, though, again, though we loose some info, i.e., which movies are cartoons
    • NULLs approach: use only one big relation
  • advantages and disadvantages of each approach?
    • consider queries
      • NULLs approach minimizes number of relations required to produce an answer
      • tradeoffs between E/R and OO
    • number of relations (small number preferred)
      • nulls at top
      • E/R
      • OO
    • storage (prefer to minimize)
      • 1 tuple per entity
        • OO
        • nulls, though the tuples might be long
      • more than 1 tuple per entity: E/R, but only key duplicated


Formula possible?

Can we develop a formula for the number of relations necessary for an E/R diagram having e entity sets and r relations?
  • may need to partition e into weak and non-weak entity sets
  • may need to partition r into supporting, isa, and non-supporting-or-non-isa relationships


References

    [FCDB] J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, NJ, Second edition, 2002.

Return Home