January 25, 2006
Coverage: [DBCB] Chapter 5, pp. 224-226; Chapter 6, p. 303;
and Chapter 16, pp. 821-835
Review of Foreign Keys
- an attribute of a relation is a foreign key if
it references the primary key of another relation
- an implementation of referential integrity
- example:
- (name) --- [University] (--- <leads> --- [President] --- (name)
- leads to relations
University(uname) and President(pname, uname)
- attribute uname is a foreign key in the President relation
How to Estimate Size of a Natural Join with
Single Join Attribute
- assume R(X,Y)
S(Y,Z)
where Y is a single
attribute
- how can the Y-values of the tuples of R and S
possibly relate?
- no relation
- leads to an estimate of 0
- Y might be key of S and foreign key of R
- leads to an estimate of T(R)
- all tuples might agree
- leads to an estimate of T(R)T(S)
- let's make two simplifying assumptions
- containment of value sets
- ensures V(R,Y) <= V(S,Y)
- satisfied when Y is a key in S
and a foreign key in R
- preservation of value sets
- ensures V(R
S,Y) = V(R,A)
- satisfied when join attribute(s) of R
S form a key for S and a foreign key
for R
- can only be violated when there are `dangling tuples'
in R
- our assumptions guarantee logically consistency
- i.e., order has no affect on the
estimate of the size of a join
- proof by induction
T(R
S) = T(R)T(S)/max(V(R,Y),V(S,Y))
Example (courtesy [DBCB] example 16.23, p. 828):
- three possible orders/groupings
How to Estimate Size of a Natural Join with
Multiple Join Attributes
following rationale behind cases with one join attribute, here, we have
T(R)T(S)
max(V(R,y_1), V(S,y_1)) max(V(R,y_2), V(S,y_2)) ...
max(V(R,y_n), V(S,y_n))
the values of y_1, y_2, ..., y_n are independent
other types of joins (e.g., equijoin and theta-join) involves variations
of this
How to Estimate Size of Joins Involving =>
Two Relations
- S = R_1
R_2
...
R_n
- suppose attribute A appears in k of the R_i's, and
V(R_1,A) <= V(R_2,A) <= ... <= V(R_k,A)
- pick a tuple from relation with smallest value count for A
- a tuple from any other of the k relations has probability
of 1/(its value count) of joining with the tuple above on A
- this is true for i=2,3,...,k
- probability that all k tuples agree is 1/(v_2*v_3*...*v_k)
- rule:
product of all the T(R_i's)
product of all but the least V(R,A), for each attribute A appearing
at least twice
What about Other Operations?
- not as easy to estimate
- bag-union -- easy; what about set-union?
- is there a difference between set- or bag-union for purposes of estimation?
- difference?
- duplicate elimination?
- grouping and aggregation
References
[DBCB] H. Garcia-Molina, J. D. Ullman, and J. Widom.
Database Systems: The Complete Book. Prentice Hall, 2002.