Data must be stored onto secondary memory because of size and persistence. A page is unit of storage in main memory, blocks are units of storage in secondary memory.

Assumption: the size of a page equals the size of a block.

Access structures

Data is organized in a primary and a secondary data structure.

Each table is stored in a single primary data structure, which contains all the tuples of a table. Multiple secondary data structures can be used to index those values.

Sequential structures are usually the choice for the primary one, whilst tree-based ones for the secondary. Hash-based structures are also frequently used for both.

An index in a database is a separate data structure built on one or more attributes of a table. It speeds up query performance whenever those indexed attributes appear in conditions. It may also dictate how the table itself is stored.

Sequencial structures

Blocks for a sequential structures generally use a double stack implementation.

image.png

The organization of blocks can be made in two ways: entry-sequenced or sequentially-ordered.

Entry-sequenced (heap) structures organize tuples based on their order of entry. They are efficient for insertions and sequential reading and writing, but are non-optimal for searching specific data.

Sequentially-ordered structures are sorted based on the value of a key field. They are effective for range queries, but are slow for insertions since reordering may be needed.

Hash-based structures

In an hash structure, the key values are “folded” (transformed) to become a positive integers, then, via the module operation, an hash is computed in order to identify the correct bucket in which to store the datum. If there’s a collision, thus the bucket has already been assigned, then DBMS usually adopt an open hashing solution (separate chaining, instead of a closed hashing / open addressing alternative), for which a new bucket is linked in the one in which the collision happens.

Indices and keys

Indexes map search keys to records (or record pointers).

In a dense indexes approach every search-key value in the data file has a corresponding index entry. If instead sparse indexes are used, index entries exist only for some search-key values (e.g., the first key in each block).

Based on the physical storage organization, primary indexes are built on a field that defines the physical ordering of the file (the primary key or clustering key), a clustering index is built on a non-primary but order-determining field (not unique), while secondary indexes are built on any field that is not used for file ordering.

Index keys role is to organize data logically, on different levels, for faster retrieval. Primary keys are the unique identifier for a tuple, clustering keys allow the ordering or records by a non-primary attribute, search keys are the set of attributes used to look up data, ordering keys are either the primary or clustering keys which define the ordering of data.