What is a Nested-loops Join?
Commonly, a SQL developer writes a query using a FROM...JOIN clause whenever they need data from more than one table in a single result set. They write joins without any regard to how the database query engine will retrieve the data. After all, one of the great things about relational database are their declarative nature. You just tell them what you want causing them to go figure out how best to answer your request. But behind the scenes, the database query engine has a handful of ways to answer your query. The database query engine evaluates factors such as how many I/Os are needed, how useful are the indexes in the two joined tables, and the value of the index statistics in each of the queried tables.
Among the many different join algorithms, probably the most common join algorithm is known as the nested-loop join. Nested-loop joins are frequently chosen by the query engine because they offer a good balance between high-speed performance and low IO requirements. This algorithm is usually the best strategy for queries against small tables with simple inner joins. Nested-loop joins work exceptionally well where one table has relatively few records compared to subsequent table(s) possessing a fairly large number of records, and where all tables are indexed on the joined columns.
How does it work? A nested loop iterates through each record in the outer table once. As it steps through each record of the outer table, it then searches the entire inner table for matches of the joined column to produce the output for the result set. Once the query engine has traversed the inner table, the query engine steps through the next record of the outer table, again causing a search of the inner table, and so on until all of the records in the outer table have been traversed. Suppose we write a two table inner join on tables INNER_TABLE and OUTER_TABLE. You can expect that the database query engine will scan INNER_TABLE one time for every record that matches the join condition in OUTER_TABLE. If OUTER_TABLE lacks an index on the join column, the operation will can be very expensive since every row in OUTER_TABLE must be traversed.
Depending on the database platform and the options available to the query processor on that database platform, the query engine may designate the table with the fewest records as the inner table, so that the query generates fewer I/Os per iteration of the outer table. Because of their straight-forward method for matching result sets in two or more tables, nested loop joins frequently require the least I/O and the fewest comparisons compared to other ways to complete a join, such as using the hash join algorithm.
Within the broader category of nested loops, there are even more subcategories that define very specific elements and alternative ways to handle a nested-loop join. For example, a naive nested loop join occurs when an entire table or index is searched. In a block nested loop, the query optimizer attempts to load the entire (smaller) inner table into memory, so that the I/O load is further lightened. Other examples include an index nested loop join or a temporary index nested loop join when the query engine chooses to traverse a regular index or temporary index, respectively, instead of the actual rows of the table.