<aside> 💡 Statistics
The DBMS stores internal statistics about tables, attributes, and indexes in its internal catalog.
Different systems update them at different times.
</aside>
For each relation R, the DBMS maintains the following information:
Selection cardinality $SC(A,R)$ → the average number of records with a value for an attribute $A$ given $N_R / V(A,R)$
The selectivity (sel) of a predicate P is the fraction of tuples that qualify.
<aside> 💡 Assumptions are often not satisfied by real data.
Assumption #1: Uniform Data
Assumption #2: Independent Predicates
Assumption #3: Inclusion Principle
Real data is often skewed and is tricky to make assumptions about. However, storing every single value of a data set is expensive.
One way to reduce the amount of memory used by storing data in a histogram to group together values.
Approach #1: Equi-Width Histogram
Combines together the counts for adjacent keys to reduce the storage overhead.
Approach #2: Equi-Depth Histogram (Quantiles)
To ensure that each bucket has roughly the same number of counts, the histogram varies the range of each bucket
DBMS’s can use sampling to apply predicates to a smaller copy of the table with a similar distribution.
The DBMS updates the sample whenever the amount of changes to the underlying table exceeds some threshold