Perl Training Australia - Whirlwind web development

Paul Fenwick <pjf@perltraining.com.au>, March 2005

Slides for this paper can be found at http://perltraining.com.au/talks/whirlwind/

Introduction

Websites have rapidly become one of the defining features of the Internet, and to many lay-persons the terms 'web' and 'Internet' are often used interchangeably. Websites have become the first point of contact for many businesses and organisations to the public. It should come as no surprise that most technical professionals will become involved in website development at some stage in their career.

One thing which is clear in many people's minds is that not all websites are created equal. Sites with good layout, easy navigation, and adherence to standards are significantly more usable than sites with poor layout, confusing navigation, and a lack of standards support. However while one can rapidly gain a feel as to whether a website is good or bad, designing an effective website is a much more challenging task.

This paper is not about effective website design in the traditional sense. There'll be no lecture on about the joys of style-sheets, the strong foundation of XHTML, or how to best lay out your navigational controls. Instead, this paper will focus on maintaining your website without all the hard work. This paper focuses upon being lazy, doing the least amount of work overall to achieve your desired result and keep it that way.

Assumed knowledge

This paper assumes that the reader is familiar with HTML. A knowledge of the Perl programming language is beneficial, but not essential.

A common design problem

One of the hallmarks of good website design is consistency. A good website has consistent look-and-feel, consistent navigation, and consistent layout. A visitor can quickly find what they're after because there are no surprises. However developing and maintaining this consistency is a matter that requires some thought.

Most websites contain invariant sections (things that are always the same) , near the top and bottom of each page. The top contains the site's meta-data: including machine-readable style-sheets, search directives, related page information, copyrights, and author information. In the main body of the page, but still near the top in most cases, are directives for page layout, navigation, logos, and titles.

The bottom of a page finishes anything opened at the top. The bottom is also a traditional place to add disclaimers, copyright notices, conditions of use, and other points mainly of interest to the legal profession.

On most of our pages we want the invariant sections to be, well, invariant! However problems arise when we wish to change these sections; what happens when we have a new page to add to the navigation, or a new disclaimer from our legal department? How does one go about making these changes on a site-wide basis?

One option would be to regenerate every page. This may involve the tedious task of manually editing each page by hand, or our website design tools may provide us with a way to regenerate the whole site with new content. In either case, every single page in the site needs to be replaced.

A more popular option is to use server-side includes or similar technologies when appropriate support exists in the server. Here's an example of how a page may look using apache-style includes:

    <!--#include virtual="/top.html" -->
    <title>Ascidian Central</title>
    <!--#include virtual="/middle.html" -->
    Your one-stop shop for all your ascidian needs.
    <!--#include virtual="/footer.html" -->

We've had to use three includes in this example, because we want one variable section (the title) in what is otherwise the invariant header.

The use of includes is a step in the right direction, but still has issues associated with it. We still need to ensure that these lines appear on every page, and also in the correct locations and order. Should we want to change which includes we use -- either because of a site redesign, or because certain sections of the site require extra components -- we still need to edit each and every one of our pages to include the new material.

HTML::Mason

Our ideal situation is to simply write content, and have layout and navigation handled for us automatically. This allows us to spend time getting our real job done, rather than worrying about important but secondary issues. In order to achieve these goals, we'll introduce a technology called HTML::Mason, or simply Mason.

Mason is a web development environment, written in and supported by the Perl programming language. While a developer does not need to know Perl to gain benefit of Mason's features, a little Perl knowledge can go a long way in the efficient development of your website.

Mason's preferred environment is running under mod_perl inside the Apache web-server, and when in this environment it makes use of numerous optimisations to enhance performance. Mason can also be configured to run under other environments (including IIS), operate as a standard Common Gateway Interface (CGI) program, or work in a non-web environment entirely.

Autohandlers

Returning to our original problem, how can Mason help us with our layout and navigation problems? How can we just write content and have everything else ``just work''?

Mason has the concept of an autohandler, which is executed automatically for every request. Autohandlers can be used for a wide range of tasks, but their most common use is to 'wrap' content with a standard layout, navigation, and other invariant page features.

A typical autohandler, which we conventionally store in a file named autohandler looks like this:

    <html>
    <head>
    <title>Ascidian Central</title>
    <link rel="stylesheet" type="text/css" href="/style.css" />
    </head>
    <body>
    <h1>Ascidian Central</h1>
    % $m->call_next;
    </body>
    </html>

As can readily be seen, the contents of the autohandler is simply HTML. However there is one line in particular that is different:

    % $m->call_next;

In Mason, any line starting with a percent is interpreted as Perl. In this particular case, $m is a special object (the Mason request object), and we have requested it to call the next component in our hierarchy. In our particular case, that will be our content. Let's assume that our request was for ascidian.html, which contains the following text:

    <p>
    Ascidians are marine filter-feeders.  They have a
    tough outer 'tunic' made from polysaccharides, and
    are immobile in their adult form.  Examples include
    <i>Sea Tulips</i> and <i>Sea Squirts</i>, both of
    which can be found in Melbourne's waters.
    </p><p>
    Ascidians are unique in that they collect and store the
    mineral vanadium in their blood.  It was suggested this
    was used to carry oxygen, however modern evidence does not
    support this theory.
    </p>

When Mason processes our request, the autohandler formats the page and supplies all the invariant sections, and then inserts our content into the appropriate place. Perfect! We're now free just to write our content, and any changes to our layout and navigation only need to be done in the autohandler.

Methods

Our autohandler allows us to easily change the HTML above and below our content, for our whole site, in one simple place. However we still has a problem. Our title and headings on each page are identical and static, regardless of the content being shown. This is a bad thing, every page on the site will bookmark as 'Ascidian Central', rather than a more useful title, such as 'Ascidian Biology'.

Clearly we need our autohandler to be a little smarter. Rather than blindly pasting the same title and headings onto every page, we need it to dynamically determine which title should be used. In order to do this, we need to introduce a new concept called a method.

Methods are associated with Mason components, such as the content we have above, and can contain code or text to perform certain operations or supply required information. The most common use of methods is to provide information such as titles, sections and access restrictions.

In our example site we're going to set a method on each block of content called title. Our autohandler can then use this whenever it requires a title to be displayed. Let's first see our content with the new title method.

    <%method title>
    Ascidian Biology
    </%method>
    <p>
    Ascidians are marine filter-feeders.  They have a
    tough outer 'tunic' made from polysaccharides, and
    are immobile in their adult form.  Examples include
    Sea Tulips and Sea Squirts, both of which can be found
    in Melbourne's waters.
    </p><p>
    Ascidians are unique in that they collect and store the
    mineral vanadium in their blood.  It was suggested this
    was used to carry oxygen, however modern evidence does not
    support this theory.
    </p>

As you can see, writing a method is very straightforward. Methods can appear anywhere inside a Mason component, although by convention they only appear at the start or end. We choose to put our title at the top of our file since most people are used to seeing titles at the top of their text.

Now that we have our method, we need to tell our autohandler to use it. This is a straightforward process:

    <html>
    <head>
    <title><& REQUEST:title &></title>
    <link rel="stylesheet" type="text/css" href="/style.css" />
    </head>
    <body>
    <h1><& REQUEST:title &></h1>
    % $m->call_next;
    </body>
    </html>

We've introduced a new syntax here, <& ... &> tags. These are a request to call a component or method, and replace the contents of the tags with the output. In our particular example, we will insert the title of our content into our HTML header, and also between h1 tags.

Default Methods

Now our site is coming along well. Content is placed into files with a .html extension, and are automatically wrapped by the autohandler. Methods built into our content communicate our page title. However what happens if we forget to include a title method in one of our pages?

In our current site, if Mason can't find the title it will display an error. That may be a good thing for our development system, as we can immediately see when important methods are missing, but it may not be what we want for production. Luckily, Mason makes it easy to set defaults as well.

Whenever Mason fails to find a method on a component, it will search that component's autohandler, and its autohandler's autohandler, and so on, until it either finds the method or fails. This means that we can simply set a method on our autohandler that will get called if the request component is lacking:

    <html>
    <head>
    <title><& REQUEST:title &></title>
    <link rel="stylesheet" type="text/css" href="/style.css" />
    </head>
    <body>
    <h1><& REQUEST:title &></h1>
    % $m->call_next;
    </body>
    </html>
    <%method title>
    Ascidian Central
    </%method>

Now any page that's missing a title will receive one of 'Ascidian Central'. Alternatively we could write a method that derives the title from the URL requested, or generates a title in some other way.

Other components

Using our new <& ... &> syntax we now have a powerful way to centrally manage content and features on our website. Let's pretend that we want to display a list of our current specials on certain pages in our website. We can create a file called specials.mhtml. The mhtml suffix is a convention used to indicate this is a Mason component that returns HTML, rather than a component which a user can request directly.

    <div class="special">
    <b>Special membership offer!</b> &mdash; Purchase a two-year
    subscription and get a bonus six months, free!
    </div>

If we wish to include the content from this component we can do so simply by using <& specials.mhtml &> in our HTML. As an example, we could include this in our autohandler before the main title:

    <html>
    <head>
    <title><& REQUEST:title &></title>
    <link rel="stylesheet" type="text/css" href="/style.css" />
    </head>
    <body>
    % # Display our daily special 
    <& specials.mhtml &>
    <h1><& REQUEST:title &></h1>
    % $m->call_next;
    </body>
    </html>

Mason components can be quite advanced. They can contain Perl code, take arguments, and perform a variety of functions. A component could be used to display the current information about a user, a list of popular areas or searches, display the weather, or generate other dynamic content.

Attributes

Our hypothetical website is now looking quite good, but it's far from complete. Let's suppose that we wish to have 'members only' areas, that are only accessible to registered and logged in users. For these to work correctly, we need some way to indicate that a page is for our membership only.

Mason provides a concept of attributes, which are arbitrary labels that can be placed on a page or component. What these labels are used for is entirely up to the designer. They could indicate the section used by a site, what access restrictions should apply, the last time the content was reviewed, or any number of other things.

In our scenario, we're going to define a simple attribute, called members_only. This will be set to a true (non-zero, non-empty) value if we want to restrict access to our page, or a false or absent value if our page is open for all to see.

Let's pretend we have a page of 'member specials' that should be restricted. Here's the contents of that page:

    <%method title>
    Member Specials
    </%method>
    <%attr>
    members_only => 1
    </%attr>
    <p>
    We have great specials for our members this week.  Purchase
    an ascidian starter pack and receive a free 200g pack of
    vanadium-laced sea-monkeys, free!
    </p>

As can be seen, we set attributes inside a <%attr> block. We can set multiple attributes if desired, however each must appear on a separate line. Like methods, the <%attr> block can appear anywhere in our component, but conventionally goes at either the top or bottom. Attributes can be set to any Perl scalar value, including numbers, strings, and references to other data structures.

    <%attr>
    members_only => 1
    author => "Alice J Webmistress"
    keywords => ["specials", "members-only", "sea-monkeys"]
    </%attr>

Now that we have attributes defined alongside our content, how can we use this to restrict access to our pages automatically? The key is to change our autohandler to check for the new attributes.

    <html>
    <head>
    <title><& REQUEST:title &></title>
    <link rel="stylesheet" type="text/css" href="/style.css" />
    </head>
    <body>
    % # Only allow authenticated users to view members-only content.
    % # The code for authenticated_user() will depend upon how
    % # authentication is performed on this system.
    % if ( $members_only and not authenticated_user() ) {
        <& login_restricted.mhtml &>
    % } else {
        <h1><& REQUEST:title &></h1>
    %   $m->call_next;
    % }
    </body>
    </html>
    <%init>
    # This init block is executed before any other code.  We're using
    # it to set the $members_only variable, which will have a true
    # value if a members-only part of the site has been requested.
    my $members_only = $m->request_comp->attr_if_exists("members_only");
    </%init>
    <%method title>
    Ascidian Central
    </%method>

In our example we've used some perl code to check if our user is authenticated, or the request was for an unrestricted page (where the members_only attribute does not exist or is false). If so, we always supply the page requested.

If our authentication conditions are not met we will instead call <& login_restricted.mhtml &>. This component can display a message indicating that the user has attempted to access a restricted page, and needs to login first.

The writing of an authentication system for our website is beyond the scope of this discussion, and requires a greater knowledge of Perl than is assumed in this paper.

With these simple changes to our autohandler, we can restrict access to members-only sections of the site by adding a simple attribute to the needed pages.

It is also possible for us to restrict access to entire directories. If an attribute does not exist on a given component, it will inherit that attribute from its autohandler, in much the same way as methods are inherited. We can create a sub-directory in our site with the following autohandler:

    % $m->call_next();
    <%attr>
    members_only => 1
    </%attr>

This simply provides a default attribute for all other components in the directory, but makes no other changes. This autohandler will inherit from the one above it (in the top-level directory) which provides the authentication check. Hence it is simple for us to apply access restrictions and other attributes on a directory-wide, or even site-wide basis.

Filters

Our website now has a standard look-and-feel, consistent navigation, members-only areas, daily specials, and even some meaningful content. However our quest for an easy-to-maintain website does not stop here. HTML::Mason gives us another useful feature: filters.

Mason's filters can be used to alter the output of a component in arbitrary ways. Filters can be used to convert text into HTML on the fly, perform mark-up of code, censor profanity, or convert old HTML 4 code into new XHTML.

One of the commonly seen tasks for a dynamic website is pre-filling forms. We may wish a membership or address form to already begin with a member's details. Alternatively, a registration form may need to retain the contents of the previous submission until all sections are complete. While this is an important job in good website design, it is usually a dull and tiresome task.

Let's consider the following snippet from a survey that may appear on our site:

    <table>
    <tr>
    <th>Favourite Ascidiacea:</th>
    <td>
        <input type="radio" name="favourite" value="tulip" />
                Sea Tulip<br/>
        <input type="radio" name="favourite" value="solitary" />
                Solitary Ascidian<br/>
        <input type="radio" name="favourite" value="squirt" />
                Sea Squirt
    </td>
    </tr>

One method to generate this code would be using Perl's CGI module, which will automatically fill-in values form a submitted form:

    <%perl>
    CGI->radio_group(-name      => 'favourite',
                     -values    => [qw/tulip solitary squirt/],
                     -linebreak => 'true',
                     -labels    => {
                                     tulip    => "Sea Tulip",
                                     solitary => "Solitary Ascidian",
                                     squirt   => "Sea Squirt",
                                   },
    );
    </%perl>

Alternatively, we could manually check which entries should be selected, like this:

    <tr>
    <td>Favourite Ascidiacea:</td>
    <td>
        <input type="radio" name="favourite" value="tulip" 
    %   if( $favourite eq "tulip" ) {
                checked
    %   }
        />
                Sea Tulip
        <input type="radio" name="favourite" value="solitary"
    %   if( $favourite eq "solitary" ) {
                checked
    %   }
        />
                Solitary Ascidian
        <input type="radio" name="favourite" value="squirt" 
    %   if( $favourite eq "squirt" ) {
                checked
    %   }
        />
                Sea Squirt
    </td>
    </tr>

At this point we really should be questioning the wisdom of adding all this code to our HTML? We would like our website to be easy for the web-designers as possible. Having all this code embedded inside, or even replacing, the otherwise simple HTML makes it difficult for anyone without Mason knowledge to edit the page.

Fortunately, we can use Mason's filtering mechanism to do all the hard work of filling in the form values for us. By using Perl's HTML::FillInForm module (available from the Comprehensive Perl Archive Network http://www.cpan.org/), we can write the following at the bottom of our component:

    <%filter>
    use HTML::FillInForm;
    $_ = HTML::FillInForm->new->fill(scalarref => \$_, fdat => \%ARGS);
    </%filter>

HTML::FillInForm automatically inserts data into the HTML input, textarea, radio buttons, checkboxes and select tags. Data can either be retrieved from a previous page submission (as we have shown here), or from a variety or Perl objects and constructs, such as information from a database table.

Using this simple filter keeps our HTML clean for the web-designers and significantly reduces the time required for maintenance of our code.

Conclusion

We've overviewed how we can use a free, open-source, and cross-platform environment called HTML::Mason to improve the design and maintenance of our websites. However this paper only touches upon the basic points of Mason's capabilities. Mason includes a flexible caching strategy, easy support for sessions, and a rich debugging environment.

One of the few downsides to working with Mason is that it's not as readily available from 'off-the-shelf' hosting businesses as other web-development technologies, such as PHP or ASP. However as the demand for Mason and other richly featured development environments grows, so do the number of businesses that are willing to host and support Mason-enabled websites.

In addition to the Mason environment being free and open, there is also a large body of freely available and well-indexed documentation, available on-line for developers, designers, and administrators alike.

To use Mason's full potential requires a working knowledge of the Perl programming language; this coupling with Perl is one of Mason's greatest benefits. Existing Perl modules, of which more than 7,600 available on the Comprehensive Perl Archive Network (CPAN), can be easily used in a Mason environment. Existing Perl programs can be modified to operate with Mason, or Perl can be used to 'glue' Mason to other languages such as C or Java.

I hope that the methods examined in this paper will make your next web development experience a smooth and low-maintenance one.

Further Resources

Slides for this paper
This paper was originally presented at the SAGE-VIC 2005 symposium. A revised copy of the slides used can be found at http://perltraining.com.au/talks/whirlwind/
Mason HQ
http://www.masonhq.com/
Embedding Perl in HTML with Mason
Dave Rolsky and Ken Williams, O'Reilly and Associates, 2002. ISBN: 0-596-00225-4
The Mason Book
http://www.masonbook.com/
Web Development with Perl, Training Course
http://perltraining.com.au/webdev.html (commercial training)

Valid XHTML 1.0 Valid CSS