Monday, August 16, 2010

Marriage Sort

Several months ago, someone introduced a sorting algorithm called "Marriage Sort". The inspiration for it came from an article analyzing how to (mathematically) select the best wife/husband.

The "conclusion" drawn from the article is that, given N candidates, the strategy with the best expected value is to skip past the first sqrt(N) - 1 candidates and then choose the next "best so far".

Translated loosely into a sorting algorithm, it goes something like this:

  1. Given N candidates, calculate the number to skip.
  2. Find the "best" candidate within the skip distance.
  3. Move all the better candidates beyond the skip distance to the end.
  4. Reduce N by the number of candidates moved.
  5. Repeat from Step 1 until we run out of candidates.
  6. Perform insertion sort.

The marriage sort algorithm is not particularly fast, with a runtime of O(n1.5), but sorting algorithms are fundamental to computing, so I thought it would be fun to implement in Factor.

Note: Factor comes with some sorting algorithms. The sorting vocabulary implements merge sort and the sorting.insertion vocabulary implements an in-place insertion sort.

First, some vocabularies and a namespace (we will be using locals to implement a couple of the words):

USING: kernel locals math math.functions sequences
sorting.insertion ;

IN: sorting.marriage

We can take the loose algorithm and structure the marriage-sort word, leaving the bulk of the work for the (marriage-sort) inner loop:

: marriage-sort ( seq -- )
    dup length
    [ dup sqrt 1 - >fixnum dup 0 > ]
    [ (marriage-sort) ] while 2drop
    [ ] insertion-sort ;

We'll need to find the index of the maximum element in a range:

:: find-max ( from to seq -- i )
    from to >= [ f ] [
        from from 1 + [ dup to < ] [
            2dup [ seq nth ] bi@ < [ nip dup ] when 1 +
        ] while drop
    ] if ;

That leaves the (marriage-sort) word (probably more complex than necessary, but it works):

:: (marriage-sort) ( seq end skip -- seq end' )
    0 skip seq find-max
    skip end [ 2dup < ] [
        2over [ seq nth ] bi@ <=
        [ 1 - [ seq exchange ] 2keep ]
        [ [ 1 + ] dip ] if
    ] while nip 1 - [ seq exchange seq ] keep ;

Some performance numbers (given a 10,000 element random array):

( scratchpad ) 10000 [ random-32 ] replicate

( scratchpad ) dup clone [ natural-sort drop ] time
Running time: 0.004123694 seconds

( scratchpad ) dup clone [ marriage-sort ] time
Running time: 0.063077446 seconds

( scratchpad ) dup clone [ [ ] insertion-sort ] time
Running time: 10.972027614 seconds

As you can see, slower than natural-sort (which uses merge sort), but much faster than insertion-sort, with similar in-place semantics. It's worth noting that the code for insertion-sort seems a little slow and could probably be sped up quite a bit.

The code and some tests for this is on my Github.

No comments: