Just like it cannot assume that a table fits entirely in memory, a disk-oriented DBMS cannot assume that query results fit in memory.
The standard algorithm for sorting data which is too large to ο¬t in memory is external merge sort. It is a divide-and-conquer sorting algorithm that splits the data set into separate runs and then sorts them individually. It can spill runs to disk as needed then read them back in one at a time. The algorithm is comprised of two phases:
<aside> π‘ Phase #1 β Sorting: First, the algorithm sorts small chunks of data that ο¬t in main memory, and then writes the sorted pages back to disk.
Phase #2 β Merge: Then, the algorithm combines the sorted sub-ο¬les into a larger single ο¬le.
</aside>
<aside> π‘ Sorted Run
Run β list of key/value pairs.
Data is broken up into N pages.
The DBMS has a finite number of B buffer pool pages to hold input and output data.
Only requires three buffer pool pages to perform the sorting (B=3).
β Two input pages, one output page
<aside> π‘ Double Buffering Optimization
Prefetch the next run in the background and store it in a second buffer while the system is processing the current run.
Reduces the wait time for I/O requests at each step by continuously utilizing the disk.
</aside>
Clustered
Unclustered
An aggregation operator in a query plan collapses the values of one or more tuples into a single scalar value