CPS 432/562 Homework #3

Coverage: [DBCB] Chapter 16, § 16.4, pp. 821-834; and § 16.6, pp. 847-858
Assigned: February 8
Due: February 15, 4:30p, in class

  1. (5+5+5=15 points) In addition to indices, commercial DBMSs provide table scan operations -- these are operations that are meant to read many blocks (pages) in one stroke, provided the table is stored in contiguous locations. In fact, one of the tricks employed by DBMS vendors is to store data pertaining to one single relation on the same cylinder. Thus, if you are attempting to read all data from a single relation, then you just have to position the arm once for all reads from all the platters (the first and one of the expensive parts of the access time --- the seek time --- thus needs to be accounted for only once, in the beginning). For example, Ingres can read nearly 10 pages in a scan and this is sometimes beneficial (up to 10 times faster) to using a secondary index, even (and particularly) if the index returns all the pages of the table. Assume that a table has 2 million records and a query returns 75,000 records. Assume, further, that a block contains 22 records. Is it beneficial to have a secondary index? Why/why not? What if there are only 2 records per block?

    What guidelines can you provide for when to use secondary indices?

  2. (1+2+2=5 points) Exercise 15.3.3 (parts a, b, c) on p. 737 from [DBCB]: Suppose B(R) = B(S) = 10,000. What value of M would be necessary to compute R S using the nested-loop algorithm with no more than a) 100,000, b) 25,000, and 15,000 disk I/O's?

  3. (10+10+10=30 points) Exercise 13.3.1 (parts b, c, and d) on p. 646 from [DBCB]: Suppose that blocks can hold either 10 records or 99 keys and 100 pointers. Also assume that the average B-tree node is 70% full; i.e., it will have 69 keys and 70 pointers. We can use B-trees as part of several different structures. For each structure defined below, determine (i) the total number of blocks needed for a 1,000,000-record file, and (ii) the average number of disk accesses to retrieve a record given its search key. You may assume that nothing is in main memory initially, and that the search key is the primary key for all the records.

    1. The data file is a sequential, unsorted file, with 10 records per block. The B-tree is a dense index.

    2. The data file is a sequential file, sorted on the search key, with 10 records per block. The B-tree is a sparse index.

    3. Instead of the B-tree leaves having pointers to records, the B-tree leaves hold the records themselves. A block can hold 10 records, but on average, a leaf block is 70% full. i.e., there are 7 records per leaf block.

  4. (10 points) Required only for CPS 562 students
    We mentioned in class that the choice of an index, to a certain extent, depends on (and influences) the physical organization of data on secondary storage. Give an example of a physical organization of data (i.e., something like sorted files, B-trees, or hash tables) where a secondary index is needed even on the primary key! Explain why.


Return Home