MySQL filesort algorithms

By:    Updated: March 1,2017

MySQL sorts results can be a costly operation, so we can often improve performance by avoiding sorts or by performing them on fewer rows. When MySQL can't use an index to produce a sorted result, it must sort the rows itself through filesort which sorts in memory or on disk. MySQL has two filesort algorithms (the original filesort algorithm, and the modified filesort algorithm) for sorting and retrieving results. The MySQL optimizer selects which filesort algorithm to use. It normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm.

The original filesort algorithm (Reading the rows twice)

Reads row pointers (row ID) and ORDER BY columns, sorts the row ID by the ORDER BY columns, and then scans the row ID sorted list and rereads the rows from table for output.

 

One problem with this algorithm is that it reads the rows twice: One time during WHERE clause evaluation, and again after sorting the value pairs. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.) This algorithm can be quite expensive, because the second time causes a lot of random I/O. This algorithm works as follows:

  • Read all rows according to key or by table scanning. Skip rows that do not match the WHERE clause.
  • For each row, store a pair of values (the sort key value and the row ID) in the sort buffer.
  • If all pairs fit into the sort buffer, no temporary file is created. Otherwise, when the sort buffer becomes full, run a qsort (quicksort) on it in memory and write it to a temporary file. Save a pointer to the sorted block.
  • Repeat the preceding steps until all rows have been read.
  • Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.
  • Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left.
  • On the last multi-merge, only the row ID (the last part of the value pair) is written to a result file.
  • Read the rows in sorted order using the row IDs in the result file. To optimize this, read in a large block of row IDs, sort them, and use them to read the rows in sorted order into a row buffer.

The modified filesort algorithm (Reading the rows once)

Reads all the columns needed for the query, sorts them by the ORDER BY columns, and then scans the sorted list and outputs the specified columns (In other words, it records the sort key value, but instead of the row ID, it records the columns referenced by the query).

 

This algorithm can be much more efficient, especially on large I/O-bound datasets, because it avoids reading the rows from the table twice and trades random I/O for more sequential I/O. However, it has the potential to use a lot more space, because it holds all the desired columns from each row, not just the ORDER BY columns. This means fewer tuples will fit into the sort buffer, and the filesort will have to perform more sort merge passes. In other words, this algorithm is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the MySQL optimizer uses this algorithm only if the total size of the extra columns (all the columns needed for the query + the ORDER BY columns) in the sort tuple does not exceed the value of the max_length_for_sort_data system variable:

if ( (query_needed_columns + order_by_columns) <= max_length_for_sort_data )
{
      //use the modified algorithm
}

This algorithm works as follows:

  • Read the rows that match the WHERE clause.
  • For each row, record a tuple of values consisting of the sort key value and the columns referenced by the query.
  • When the sort buffer becomes full, sort the tuples by sort key value in memory and write it to a temporary file.
  • After merge-sorting the temporary file, retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.
More in Development Center
New on Valinv
Related Articles
Sponsored Links