# April 10, 2006

Coverage: [DBCB] Chapter 20, pp. 1079-1087

## Why Might the Core of the Formal Data Cube Contain Aggregations

• consider our good ole fact table: Sales(isbn, date, dealer, price)
• notice that isbn is not a key
• let's add a serial number: Sales(serialno, isbn, date, dealer, price)
• while serialno is a key, it is not well suited for the cube
• because summing price over any/all attributes while keeping serialno fixed has no effect
• so let's go back to our original friend: Sales(isbn, date, dealer, price)
• it is okay for no key to appear among the dimensions
• just means that now each point in the cube, rather than being the price of a single book, represents the total sales of that isbn on a given date, by a given dealer
• recall that CUBE operator sums dependent attributes
• for that we need total number of sales
• final fact table: Sales(isbn, date, dealer, val, cnt)
• notice now that individual books are not visible

## Example Tuples of CUBE(Sales)

• ('123456', '2006-03-30', 'Books&Co', 42.12, 3)
• notice this tuple is also in Sales
• (*, '2006-03-30', 'Books&Co', 3,000.56, 131)
• now this tuple is not in Sales
• (*, '2006-03-30', *, 100,007.72, 5,360)
• (*, *, *, 4,000,120.54, 200,120)

## Another Query

```SELECT dealer, AVG(price)
FROM Sales
WHERE isbn='12345'
GROUP BY dealer;
```
is answered by computing the tuples (d, v/n) over tuples from CUBE(Sales) with form ('12345', *, d, v, n)
• '12345' is a value for attribute isbn
• d is a group (dealer) variable
• * is neither a value, nor group

## All or Nothing Aspect of CUBE Operator

• aggregates by one or more of the attributes in the fact table (each of which is typically a foreign key into a dimension table)
• e.g., group by dealers or not at all
• we'd like to be able to group at a finer level of granularity
• as the number of grouping attributes grows, two problems arise:
• it becomes expensive to store the results, there are too many
• such a large number of aggregates are not easily organized into the cube
• solution approach taken by most commercial database systems:
• support the user in choosing some materialized views of the cube
• a materialized view is the result of a query which we store in the DB
• typically an aggregation of the full data cube

## Some Example Queries with Materialized Views

• adapted from [DBCB] Example 20.19, pp. 1083-1085
• group dates by month and dealers by city:
```INSERT INTO SalesV1
SELECT isbn, month, city, SUM(val) AS val, SUM(cnt) AS cnt
FROM Sales JOIN Dealers ON dealer = name
GROUP BY isbn, month, city;
```
• group dates by week and dealers by state
```INSERT INTO SalesV2
SELECT isbn, week, state, SUM(val) AS val, SUM(cnt) AS cnt
FROM Sales JOIN Dealers ON dealer = name
GROUP BY isbn, week, state;
```
• these views can be used to answer queries that partition no finer than either in any dimension
• ```Q1: SELECT isbn, sum(val)
FROM Sales
GROUP BY isbn;
```
can be answered using SalesV1 or SalesV2, how?
• ```Q2: SELECT isbn, year, state, sum(val)
FROM Sales JOIN Dealers ON dealer = name
GROUP BY isbn, year, state;
```
can be answered using only SalesV1 (how?), not SalesV2 (why?)
• state is not an attribute of SalesV1, only city
• cannot roll weeks into years
• ```Q3: SELECT isbn, date, SUM(val)
FROM Sales
GROUP BY isbn, date;
```
can be answered by either SalesV1 or SalesV2
• month (of SalesV1) and week (of SalesV2) are too coarse to recover days

## A Lattice of Views

• how can we systematize our creation of materialized views wrt the queries we'd like to be able to answer?
• use graphical depictions (called Hasse diagrams) of lattices of materialized views
• what is a lattice?
• it is a partial order with a least upper bound (lub) and a greatest lower bound (glb)
• what is a partial order?
• a reflexive, antisymmetric, and transitive relation
• what is a relation?
• this you should know... :-)
• a hierarchy is a partial order with only a glb
• P1 <= P2 means that each group in partition 1 (i.e., P1) is contained in some group of partition 2 (P2)
• lattice of partitions for time intervals

(courtesy [DBCB] Fig. 20.20, p. 1085)

• lattice of partitions for dealers

(courtesy [DBCB] Fig. 20.21, p. 1086)

• notice that this represents a total order
• all total orders are also partial orders, but not vice versa

## Higher-Order Lattices of Views (and Queries)

• V1 <= V2 means that in each dimension, the partition P1 used in V1 is at least as fine as the partition P2 used in V2 (i.e., P1 <= P2)
• we can answer a query Q using view V iff V <= Q
• higher-order lattice involving Sales, the two materialized views (SalesV1 and SalesV2), and three previous queries (Q1, Q2, Q3)

(courtesy [DBCB] Fig. 20.22, p. 1087)

• such higher-order lattices can be used to help design data-cube DBs
• and in this sense, can be used as a basis for a data-cube database design tool

## References

[DBCB] H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 2002.