Perl's sort function

[ Perl tips index ]
[ Subscribe to Perl tips ]

A common task when working with any type of data is to sort that data into a consistent ordering. A list of people may be sorted by name, a list of files may be sorted by size, and a list of phone calls may be sorted by time or duration.

Perl makes it easy to sort lists of values. If I have a list of names, I can produce a sorted list of names with one simple line:

    @sorted_names = sort @names;

By itself, Perl's sort functions will place elements into "asciibetical" order. That is, according to the ordering of the ASCII table. For the most part this is the same as alphabetical, except that case does matter -- all strings starting in upper case will always be listed before those starting in lower case. That's very fast, but it's not always what we want.

What if we wanted to sort our list in true alphabetical order, or by numerical order instead? To do that, we need to pass an argument to Perl's sort function.

    @sorted_names = sort {lc $a cmp lc $b} @names;

The argument to sort is a block that Perl will call to determine how to order two elements. The rules here are simple. Two special variables, $a and $b, are bound by Perl to the values being compared. Perl then invokes our code and expects to see a value of -1, 0, or +1, depending on whether $a is less than, equal to, or greater than $b.

In our example, the 'cmp' operator returns -1, 0, or +1 for these three conditions, and compares both of its arguments as strings. Since we wanted to sort our values alphabetically, regardless of case, we also call 'lc' on each of the values, to provide us with a lower-cased version of each string.

If we want to perform a numerical sort, we simply need to pass in a different comparison block:

    @sorted_numbers = sort {$a <=> $b} @numbers;

Here we're using the '<=>' operator, popularly known as the 'starship operator'. It also returns values of -1, 0, or +1 as appropriate, but compares its arguments as numbers, not strings.

If we want to have a list in descending order (largest to smallest) rather than the other way around, we only need to swap the positions of $a and $b:

    @descending_numbers = sort {$b <=> $a } @numbers;

As you can see, Perl's sort is very flexible.

It's also possible to pass an arbitrary subroutine to Perl's sort function. This allows us to not only use some very complex comparison functions, but can also improve readability. Let's pretend we have two subroutines declared, using our sort blocks from above:

    sub numerically    { $a <=> $b }
    sub alphabetically { lc $a cmp lc $b }

Now we can write:

    @sorted_numbers = sort numerically    @numbers;
    @sorted_names   = sort alphabetically @names;

NEXT WEEK: Using the Schwartzian Transform when working with expensive comparison algorithms

[ Perl tips index ]
[ Subscribe to Perl tips ]

This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.

If you have any questions please don't hesitate to contact us:

Phone: 03 9354 6001 (Australia)
International: +61 3 9354 6001

Valid XHTML 1.0 Valid CSS