Monday, January 22, 2018

Faster Sorting of Arrays of Primitives Coming to Java?

It appears that sorting arrays of primitives in Java may experience a performance improvement in the not-so-far future. Vladimir Yaroslavskiy has posted a message to the core-libs-dev mailing list titled "The new optimized version of Dual-Pivot Quicksort" in which Yaroslavskiy writes of an "optimized and faster version of Dual-Pivot Quicksort" that he has "been working on ... for the last 5 years."

The "The new optimized version of Dual-Pivot Quicksort" message includes some historical background on the Dual-Pivot Quicksort; highlights relative performance of the new version for random data, "nearly structured arrays," and "period inputs"; provides a comprehensive summary of the changes involved; and provides a link for open code review of the changes.

The Dual-Pivot Quicksort algorithm was introduced to Java in 2009. In another core-libs-dev mailing list post written in September 2009 and called "Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort", Yaroslavskiy wrote, "I'd like to share with you new Dual-Pivot Quicksort which is faster than the known implementations (theoretically and experimental). I'd like to propose to replace the JDK's Quicksort implementation by new one." That post described the "classical Quicksort algorithm" scheme and some modifications to that scheme before describing how "the new Dual-Pivot Quicksort uses *two* pivots elements" instead of the single pivot element used by all earlier schemes.

The original message "Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort" features some other interesting historical details as well that are highlighted here.

  • An e-mail message pasted into this message from Jon Bentley states, "I think that Vladimir's contributions to Quicksort go way beyond anything that I've ever done, and rank up there with Hoare's original design and Sedgewick's analysis." That message also provides brief but interesting historical background on the development of quicksort. That message says much about Yaroslavskiy's contributions, but I think it also says much about Jon Bentley's character.
  • An e-mail message pasted into this message from Josh Bloch states, "I believe it's not unlikely that this code may end up getting ported to many languages and widely deployed in much the manner of Bentley and McIlroy's fine sort (which is nearing 20 successful years in the field)." This has turned out to be the case as other languages (or libraries for languages) have adopted this algorithm in some measure with examples including JavaScript, Python, and Ruby.

The likely performance improvements from the new and improved version of the Dual-Pivot Quicksort will be seen in use of the overloaded versions of Arrays.sort() methods on primitive array types. The search term "Dual-Pivot Quicksort" occurs 14 times in the Javadoc-generated HTML output associated with the JDK 9 version of the Arrays class:

Because the quicksort is only used for sorting primitives, these performance enhancements to the dual-pivot quicksort only affect methods on primitives and don't affect methods such as Arrays.sort(Object[]) that tend to use the merge sort instead.

As far as I can tell, there is no specific release of Java for which these performance improvements are targeted, but they seem to have had extensive review and testing, so the improvement of performance related to sorting of arrays of primitives may be coming soon to a version of Java near you.

References

No comments: