Just like it cannot assume that a table fits entirely in memory, a disk-oriented DBMS cannot assume that query results fit in memory.

1. External Merge Sort


The standard algorithm for sorting data which is too large to fit in memory is external merge sort. It is a divide-and-conquer sorting algorithm that splits the data set into separate runs and then sorts them individually. It can spill runs to disk as needed then read them back in one at a time. The algorithm is comprised of two phases:

<aside> πŸ’‘ Phase #1 – Sorting: First, the algorithm sorts small chunks of data that fit in main memory, and then writes the sorted pages back to disk.

Phase #2 – Merge: Then, the algorithm combines the sorted sub-files into a larger single file.

</aside>

<aside> πŸ’‘ Sorted Run

Run β†’ list of key/value pairs.

1.1 2-Way External Merge Sort

Data is broken up into N pages.

The DBMS has a finite number of B buffer pool pages to hold input and output data.

Only requires three buffer pool pages to perform the sorting (B=3).

β†’ Two input pages, one output page

Untitled

<aside> πŸ’‘ Double Buffering Optimization

Prefetch the next run in the background and store it in a second buffer while the system is processing the current run.

Reduces the wait time for I/O requests at each step by continuously utilizing the disk.

</aside>

1.2 General External Merge Sort

1.3 Using B+Trees for Sorting

Clustered

Clustered

Unclustered

Unclustered

2. Aggregations


An aggregation operator in a query plan collapses the values of one or more tuples into a single scalar value