Quicksort
The quicksort is an algorithm for sorting in order a list of items. This is another divide-and-conquer algorithm, somewhat like the mergesort, but with some big differences. For speed, the mergesort is O(n log(n)) for best case and O(n^2) for worst case. So, why choose a quicksort over a mergesort? Apparently in real world applications it is often two to three times faster than a mergesort.
There are three fundamental things you have to know to understand a quicksort:
- It uses a binary break down of a list. You keep dividing the list by half, until you get down to single elements, at which point the list is in sorted order.
- Each pass through a list, s, of n elements you place all elements less than the nth element, s[p], to the left of it (call it the left group) and all elements greater or equal to it to the right of it (call it the right group). This is called partitioning. These two groups are now in order, so that each and every element in the left group is less than each and every element in the right group. You follow the same process for the left and right groups, dividing each of them into two groups, left and right. And you continue dividing and partitioning until all elements are ordered.
- The list partitioning is done without creating any new buffers, so it's important to understand the method by which the elements are grouped in the same buffer. This requires understand just two basic cases, what you do when an element is less than the partition element, s[p], and what you do when it is greater than or equal to s[p].
Quick example to start. If you have the word "jupiter", the first pass would result in this:
"jpiertu"
The letter 'r' is the partition character, s[p]. It is the right most character of the "jupiter", which is what is selected in the algorithm. Every character to the left of 'r' is smaller than 'r', and every character to the right of 'r' is greater than or equal to 'r'. To continue sorting, you would then sort the smaller < s[p] group, "jpie", and then > s[p] group, "tu".
So, after each pass you get [ values < s[p] ][ s[p] ][ values >= s[p] ]. In the best case there will be log(n) cases. In the worst case, there will be n passes.
Note, what s[p] ends up being is random, which is the reason why the algorithm is not guaranteed to be log(n).
I'll give a more full blown example after presenting the pseudocode, here:
The Algorithm
Quicksort Example:
This next example will step you through the partition() function (see pseud-code above) using the word "randomization", so that you can see how it works. The two cases to focus on are when an element is > s[p] and when an element is <= s[p].
1> Partition with the last character, 'N'. 'N' is the partition character, s[p], in the pseudocode. All characters to the left of 'N' start out unknown. The box below designates the unknown characters as the are at the start:
2> We start with the letter 'R' and compare it to the partition character 'N'. Since 'R' > 'N', it goes into the >= N region, which is shown with the red box below:
3> Next we compare 'A' to 'N'. Since 'A' < 'N' it goes in the < 'N' region, which is designated by the green box below. Also, notice that we swap the first > 'N' value, 'R', in order to place 'A' in the proper region. The double arrowed line below shows the swap.
4> 'N' is now compared to 'N'. Since 'N' = 'N', this means it goes in the >= 'N' region, and the red box is expanded below to indicate that. Notice that no swapping is done when a >= 'N' value is found.
5> 'D' is < 'N', so it is placed in the < 'N' region. To do this we swap 'R' with 'D', and the result is this:
6> All, but the last step, follow in the same way:
7> The final step of the partition() algorithm is to swap the partition character with the first position large value. The arrow showing the swap and final state after the first pass are shown below:
8> That partitioning was done in the same buffer, so no new buffers have to be created for a quicksort. In the next steps, the letters in the green box are partitioned using the right most character 'I', and the letters in the red box are partitioned using the right most character 'R'. Below is a a graphic showing the steps.
Notice, the blue box gives the partition character, s[p]. The arrows indicate the buffer's state just before it enters the partition algorithm. Recall, the black box contains the characters as yet unknown, the red box the characters >= s[p] and the green box the characters < s[p]. The process continues from step 7 above, and the logic is the same as given above.