Untitled

1. Joins


1.1 Operator Output

Multiple approaches to the contents of the join operator output:

  1. Data → Early Materialization
    1. Advantage: future operators in the query plan never need to go back to the base tables to get more data
    2. Disadvantage: requires more memory to materialize the entire tuple
  2. Record Ids → Late Materialization
    1. Ideal for column stores
    2. Problematic when data are store in distributed system

1.2 Cost Analysis

Only I/Os from computing the join are considered. Because the output for any algorithm will be the same, so the output cost will not change among different algorithms.

Variables used in this lecture:


At a high-level, this type of join algorithm is comprised of two nested for loops that iterate over the tuples in both tables and compares each unique of them. If the tuples match the join predicate, then output them.

2.1 Simple Nested Loop Join

For each tuple in the outer table, compare it with each tuple in the inner table.

This is the worst case scenario.