[ Perl tips index]
[ Subscribe to Perl tips ]
Syndicated content has become an expected part of any regular news source. Users don't want the bother of checking sites every day, and mailing lists don't allow the user enough flexibility when dealing with a large amount of news. Regular publishing formats are also notoriously unfriendly to robots, providing barriers to indexing and searching.
XML based feed formats such as RSS and Atom have grown popular as ways to present news in a consistent, machine-readable fashion. These can be read by in-browser plugins such as Sage for Firefox http://sage.mozdev.org/ or by external syndication sites such as my.yahoo. Even popular websites such as facebook and jaiku.com allow the importing of blogs and other feeds via RSS and Atom.
Feeds are great if you're a consumer of content, but if you're a
producer of content they can represent quite a challenge. That's
where XML::Atom::SimpleFeed comes in.
There's nothing like an example to explain how things can be done, and since we've just enabled an Atom view of the Perl Tips mailing list, we can't think of a better example than our tips themselves.
Our tips start their live in Plain Old Documentation format (see perlpod for details) and are then rendered into plain-text for e-mail, and HTML for display on the web. Our goal was to convert this static HTML format into something that would form an automatic Atom feed.
Our first step is to find all the Perl Tips we've written. We
already have these stored by date on our webserver, and getting
the list is as easy as using Perl's in-built glob function
with a little code to weed-out pages like index.html
We're not going to show you the exact code, but at the end of this we
have an array of pathnames, relative to our webserver root. Thanks to
glob sorting filenames, these are already in chronological order.
The resulting list looks like this:
my @tips = qw(
# ...
/tips/2007-06-18.html
/tips/2007-07-04.html
/tips/2007-07-30.html
);
In order to create our feed, we need the date of the last tip published. We don't want to use the current time and date, as our feed is only going to be updated whenever a new tip is released:
my ($updated) = ($tips[-1] =~ m{(\d{4}-\d{2}-\d{2})});
$updated .= "T00:00:00Z";
Note that our $updated string is in the format
2007-09-26T00:00:00Z. Atom feeds tend to be rather picky about
their date formats.
Now, to create our feed object:
my $feed = XML::Atom::SimpleFeed->new(
title => "Perl Tips",
subtitle => "From Perl Training Australia",
logo => "http://perltraining.com.au/images/logo.png",
link => "http://perltraining.com.au/tips/",
link => {
rel => 'self',
href => 'http://perltraining.com.au/tips/index.atom',
},
id => "http://perltraining.com.au/tips/",
author => "Perl Training Australia",
updated => $updated,
);
It's worth making a few notes about some of the attributes we're using at this point.
link attributes. The first (which provides only a
URL) is considered to be an alternate link; in other words a URL that
provides a different view of the same data.
The second link has a relationship of self. The Atom draft
specifies that all feeds should provide a link to where the
feed can be fetched, which we do with our href.
id is a required field providing a unique, unchanging identifier for
this feed. This should not change even if we change the location for the
Perl Tips feed. This allows systems to follow a feed even though it may be
moved between different hosts and locations.
2007-09-27T12:34:56Z .
We're not being very strict about our date and time, approximating it only to the nearest day. For more regular news items you'll want to be more accurate.
If this element is omitted, a timestamp with the current date and time is used.
Now that we've made our feed object, let's start populating it with data:
use constant MAX_ENTRIES => 5;
foreach (1..MAX_ENTRIES) {
last if not @tips; # Stop if our list is empty
my $tip = pop(@tips); # Take our next most recent tip
# Extract our tip's date from its name, and format
# it into an RFC-3339 timestamp.
my ($date) = ($tip =~ m{(\d{4}-\d{2}-\d{2})});
$date .= "T00:00:00Z";
# Load our tip as an HTML::Mason component, and
# (using its name) generate a URL to the tip on our
# website.
my $comp = $m->fetch_comp($tip);
my $link = "http://perltraining.com.au$tip";
# Render the link's content, and if we can find an
# "END_SUMMARY" comment, then snip everything past
# that point and replace it with a 'Read more...' tag.
my $summary = $m->scomp($tip);
$summary =~ s{<!--\s*END_SUMMARY\s*-->.*}
{<p><b><a href="$link">Read more...</a></b></p>}s;
# Add our entry to the feed.
$feed->add_entry(
title => $comp->scall_method("title"),
link => $link,
id => $link,
summary => $summary,
updated => $date,
);
}
We're using HTML::Mason for our website, which is why we can can
fetch pages as components, and query them for their title and content.
Rather than publishing full tips via Atom, we instead publish a number
of tips and their summaries, using a simple HTML comment in the content
to indicate where the summary section should end.
While most of the fields used have the same meaning as they do
when creating a feed (except with an entry-level scope), it should
be noted that you can use content if you're supplying your full
content, and summary if you're supplying just a summary. You
should try to have at least one or the other.
Printing our feed is the easy part. For our tips, we just set the content type appropriately and print them:
$r->content_type("text/xml; charset=us-ascii");
$feed->print;
Since we're using HTML::Mason under mod_perl, we alter the content
type using the apache request object ($r). How you set your
content-type will depend upon the framework employed.
Most feeds don't change all that often, so rather than rebuild them for every request, it's usually a good idea to cache your content.
In HTML::Mason this is as simple as adding the following to the top of our code:
return if $m->cache_self(expire_in => '1 hour', busy_lock => '30 sec');
which caches content for an hour, and allows 30 seconds for content regeneration. While we don't show it here, our actual code sets the content-type before we do the cache check, otherwise our data could end up being served with the wrong content-type.
If you're using a different system from HTML::Mason, you may wish to
consider using a module such as Cache::Cache to implement caching
in an efficient manner. This is particularly important if your feed
becomes popular, as you may end up with a large volume of requests.
XML::Atom::SimpleFeedHTML::Mason
[ Perl tips index ]
[ Subscribe to Perl tips ]
| Location | Course | Course Date | Duration | Early Bird Date |
|---|---|---|---|---|
| Melbourne | Programming Perl | Tue 2 Sep 2008 | 4 days | Mon 4 Aug 2008 |
| Sydney | Programming Perl | Tue 7 Oct 2008 | 4 days | Mon 8 Sep 2008 |
| Canberra | Programming Perl | Mon 24 Nov 2008 | 4 days | Mon 27 Oct 2008 |
For future dates, please see our training calendar.
This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.
If you have any questions please don't hesitate to contact us:
| Email: | contact@perltraining.com.au |
| Phone: | 03 9354 6001 (Australia) |
| International: | +61 3 9354 6001 |
Copyright 2001-2008 Perl Training Australia. Contact us at contact@perltraining.com.au