Subroutine and regular expression references

[ Perl tips index ]
[ Subscribe to Perl tips ]

Last week we discussed variable references. These allow us to keep the identity of arrays and hashes that we pass to subroutines. References are also used to built complex data structures, which are important in many problem domains.

This week we'll look at two other useful kinds of references: subroutines and regular expressions.

References to subroutines

As well as taking references to variables, we can take references to subroutines. This is useful as it allows us to build dispatch tables, pass around processing functions and much more.

To obtain a reference to a subroutine, we use a backslash, the same operator that we use to obtain a reference to a variable. However, to indicate to Perl that we want a reference to the subroutine we need to prefix the subroutine with an ampersand:

        my $sub_ref = \&my_subroutine;

Leaving off the ampersand, or supply parentheses at the end of the subroutine name will result in a reference to its return value. This is an easy mistake to make when taking a subroutine reference.

It's also possible to take a reference to an anonymous subroutine by using Perl's sub keyword without providing a subroutine name:

        my $sub_ref = sub { print "Hello World!\n"; };

To invoke a subroutine to which we hold a reference, we simply use Perl's arrow notation:


Dispatch tables

Often when we deal with input from a user, we often end up writing code handling a number of differing cases. In many of these cases our code often looks a lot like this:

        if ( $action eq "this" ) {
        elsif ( $action eq "that" ) {
        elsif ( $action eq "something else" ) {
        else {

While this is certainly one way to do it; it can seem unwieldy and space consuming. However with subroutine references we can create a very elegant and fast alternative:

        # Dispatch table (hash of subroutine references)
        my %dispatch = (
                this => \&this,
                that => \&that,
                "something else" => \&something_else,

        # Check that the action exists in our table
        if ( exists $dispatch{$action} ) {
        } else {

This allows us to add or change new cases easily, in a single place, while simplifying our code. If your subroutines are designed to accept the same parameter list, then you can pass in parameters when you invoke the subroutine:

        $dispatch{$action}->($dbh, $cgi, $status);

Passing around processing functions

A good example of passing a subroutine reference to another subroutine is when using the File::Find module. File::Find's find function takes a subroutine reference and a list of directories to search. The subroutine referred to is then used to process each file and directory. For example, to print the paths and file names of all the Perl scripts (assuming they end in .pl) in a directory tree we might write:

        use File::Find;

        find(\&wanted, '.');

        sub wanted {
                print "$File::Find::name\n" if /\.pl$/;

File::Find's find takes a subroutine reference to make it easy for the programmer to specify what to do with each file and directory. Furthermore, as the subroutine reference is an argument to find, this means that you can call find from different places in your code each time with a different subroutine reference.

Ref and subroutine references

To check whether something is a reference to a subroutine, we can use the ref function:

        my $sub_ref = sub { print "I'm a subroutine!\n" };
        print ref $sub_ref;             # prints CODE

Data::Dumper and subroutine references

If you are using Data::Dumper version 2.121 (or later) you can access the code within a subroutine reference by setting the Data::Dumper::Deparse variable to a true value.

use Data::Dumper;
$Data::Dumper::Deparse = 1;

# Here's a reference to a subroutine...
my $subref = sub {
	my ($x, $y) = @_;

	# Return the larger of two arguments.
	if ($x > $y) {
		return $x;
	return $y;

# This correctly prints the subroutine's code.
print Dumper $subref;

# output:
$VAR1 = sub {
    my($x, $y) = @_;
    if ($x > $y) {
	return $x;
    return $y;

This will only work if your subroutines can be properly reconstructed using B::Deparse.

Regular expression references

Once you get used to the idea of passing around subroutine references for special processing needs, you might start thinking about whether you can do the same with regular expressions. Well, you can, and it's also easy.

To create a regular expression for use later, we use qr//:

        my $regexp = qr/^Perl$/;        # a line only containing Perl

This compiles the regular expression for use later. If there's a problem with our regular expression, we'll hear about it immediately. To use this pre-compiled regular expression we can use any of the following:

        # See if we have a match
        $string =~ $regexp;

        # A simple substitution
        $string =~ s/$regexp/Camel/;

        # Comparing against $_

Regular expression references can save you time and effort and reduce the number of places where bugs can slip into your programs. For example, you might use the same regular expression which validates a filename:

        my $file = qr/\w+\.\w+/;

If you check filename validity in more than one script or in more than one place in a script, then should your definition of a filename change, you'll have to update the regular expression in multiple places. However, by using a regular expression reference, you can use the expression many times, but only need to update it in a single location. By giving your references appropriate names, you can also significantly improve the readability of your regular expressions:

        # Check our input refers to a valid filename:
        /^Filename: "($file)"\s*$/;

An alternative to using qr// is to create your regular expression as a string and pass that around. However, this requires Perl to recompile your regular expression fragment for each match. You also miss the advantage of compile-time errors for any syntax errors that may be present in your expression.

Ref and regular expression references

To check whether something is a reference to a regular expression, we can use the ref function:

        my $regexp = qr/^Perl$/;

        print ref $regexp;      # prints 'Regexp'

Something very special happens when you use a regular expression reference as a string. It get turns back into a human readable representation of the original regular expression.

        print "$regexp\n";      # prints '(?-xism:^Perl$)'

The (?-xism:...) notation is an indication of which regular expression switches have been enabled for this fragment. In this particular example, all four switches ('x', 'i', 's', and 'm') are disabled.

References and objects

Perl allows you take references to almost anything you can access in Perl. Furthermore, anything you can get a reference to can be 'blessed' and turned into an object. Should you ever have a need for a class based on a regular expression, Perl is very happy to oblige.

In summary

Perl's variable references allow us to pass multiple lists into subroutines, create complex data structures and objects. With subroutine references we can create processing subroutines, and build elegant and powerful dispatch tables. Regular expression references allow the creation of a regular expression in one place for use in many places, improving speed, readability, and maintainability.

Furthermore, all of these can be used as a basis for Perl objects, although the hash reference remains the most common choice.

[ Perl tips index ]
[ Subscribe to Perl tips ]

This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.

If you have any questions please don't hesitate to contact us:

Phone: 03 9354 6001 (Australia)
International: +61 3 9354 6001

Valid XHTML 1.0 Valid CSS