The Nested Loops
join uses one join input as the outer input and one as the inner input. The
outer loop consumes the outer input table row by row. The inner loop, executed
for each outer row, searches for matching rows in the inner input table. A
Nested Loops join is particularly effective if the outer input is small and the
inner input is preindexed and large. In many small transactions, such as those
affecting only a small set of rows, index Nested Loops joins are superior to
both Merge joins and Hash joins. In large queries however, Nested Loops joins
are often not the optimal choice.
An
index nested loops join performs better than a merge join or hash join if a
small set of rows are involved. Whereas, if a large set of rows are involved
the Nested Loops join might not be an optimal choice. Nested Loops support
almost all join types except right and full outer joins, right semi-join and
right anti-semi join.
Merge Join
The first thing that you need to know about a Merge join is
that it requires both inputs to be sorted on join keys/merge columns (or both
input tables have clustered indexes on the column that joins the tables) and it
also requires at least one equijoin (equals to) expression/predicate.
Because the rows are pre-sorted, a Merge join immediately
begins the matching process. It reads a row from one input and compares it with
the row of another input. If the rows match, that matched row is considered in
the result-set (then it reads the next row from the input table, does the same
comparison/match and so on) or else the lesser of the two rows is ignored and
the process continues this way until all rows have been processed..
A Merge join performs better when joining large input tables
(pre-indexed / sorted) as the cost is the summation of rows in both input tables
as opposed to the Nested Loops where it is a product of rows of both input
tables. Sometimes the optimizer decides to use a Merge join when the input
tables are not sorted and therefore it uses an explicit sort physical operator,
but it might be slower than using an index (pre-sorted input table).
Hash Join
A Hash join is normally used when input tables are quite
large and no adequate indexes exist on them. A Hash join is performed in two
phases; the Build phase and the Probe phase and hence the hash join has two
inputs i.e. build input and probe input. The smaller of the inputs is
considered as the build input (to minimize the memory requirement to store hash
table discussed later) and obviously the other one is the probe input.
During the build phase, joining keys of all the rows of the
build table are scanned. Hashes are generated and placed in an in-memory hash
table. Unlike the Merge join, it's blocking (no rows are returned) until this
point.
During the probe phase, joining keys of each row of the probe
table are scanned. Again hashes are generated (using the same hash function as
above) and compared against the corresponding hash table for a match.
There is no “ideal join,” it all depends on the data being
joined.
From a resource perspective, a nested loop join generally
uses fewer resources, and seeing one generally is often an indicator of good
overall performance. Usually best for smaller joins.
- Merge joins are often used for moderate to large data sets, and are most efficient if the joined columns are pre-sorted, otherwise they have to be sorted before the merge join can occur.
- Hash joins are often used when very large data sets are used. Hash joins parallelize and scale better than other joins and are good at minimizing response times for OLAP queries. Seeing one may be a hint that too much data is being returned, especially in OLTP applications.
No comments:
Post a Comment