[ Perl tips index]
[ Subscribe to Perl tips ]
The de-facto choice for Perl objects has been to use blessed hashes. A hash makes it easy to both add new attributes and access existing ones. Unfortunately, this ease of access is also one of the greatest problems with a hash-based structure. In this tip we'll cover an alternative Perl object structure known as an inside-out object.
In a perfect world everyone would obey the rules and only use the documented
interface for each class. Unfortunately, the world isn't always perfect.
Maybe a developer will bypass the interface to try and squeeze some extra
performance out of their code. Maybe they used Data::Dumper to inspect
your object and wrote their code to extract attributes without ever reading
your documentation.
When you change your object's implementation, any code that bypasses your documented interface will break. Of course, the miscreant developer who wrote the bad code was fired years ago for not conforming to coding guidelines, but it's your changes that just caused the system to break. Even if you can convince your boss that it isn't your fault, it will still be your job to make things work.
The other problem with using hashes comes down to simple typographical
errors. Let's pretend that one of your attributes is address,
but somewhere in your code you make an accidental typo, forgetting
a 'd': adress:
sub get_address {
my ($this) = @_;
return $this->{address};
}
sub set_address {
my ($this, $value) = @_;
$this->{adress} = $value; # Oops!
}
The above code doesn't result in a warning. Perl is perfectly happy to add a new element to our hash, but since nothing else refers to the key it will never be used, resulting in a difficult to find bug.
Wouldn't it be great if we could have compile-time checking of attributes, rather than relying upon run-time checks and the correctness of developers? With inside-out objects, we can.
Inside-out objects are known by many names, including flyweight objects and inverted indices. Rather than storing all of our attributes inside our single object, we instead have a single hash for each attribute, and our object has an entry in each hash. The following example demonstrates the differences in structure:
# Traditional hash-based objects.
$person1 = { firstname => "Paul", surname => "Fenwick" }; # Object 1
$person2 = { firstname => "Jacinta", surname => "Richardson" }; # Object 2
$person3 = { firstname => "Damian", surname => "Conway" }; # Object 3
# Inside-out objects.
# Object 1 # Object 2 # Object 3
%firstname = ( 12345 => "Paul", 23456 => "Jacinta", 34567 => "Damian" );
%surname = ( 12345 => "Fenwick", 23456 => "Richardson", 34567 => "Conway" );
Inside-out objects provide excellent error checking, because if we make a mistake in writing an attribute name we receive an error at compile time:
use strict;
use Class::Std;
my %address;
# ...
sub set_address {
my ($this, $value) = @_;
$adress{ident $this} = $value; # Oops!
}
# Trying to compile the above code results in an error:
# Global symbol "%adress" requires explicit package name at ...
Automatic attribute checking is a big improvement in preventing what is otherwise a very common and frustrating bug. However the benefits don't stop there. Inside-out objects provide much better encapsulation than regular hash based objects.
An inside-out object contains none of its own data; instead this has
been moved into a series of hashes that are stored inside the class. By
ensuring these are declared lexically (using my %attribute) we can be sure
that nothing outside of the class is able to access these attributes.
Strong encapsulation means that a misguided developer can't bypass our interface and access attributes directly. There's simply no way that external code can access those attributes. They're simply not in scope.
We connect our attributes to our object by using a unique key. Since we're
trying to ensure object integrity, our ideal key would be fixed and
unchangeable for each object. The simplest solution would be to give each
object a sequential number upon generation and mark it as read-only.
Unfortunately this would make it very easy for external code to guess
possible key values and break encapsulation. Ideally we want our
key to be hard to fake. One solution is to use a module such as Data::UUID
which generates globally unique identifiers. Another is to realise that
every Perl variable already comes with something unique and verifiable --
its memory address.
The Scalar::Util module provides us with the refaddr function,
which returns the memory address pointed to by a given reference.
Alternately the Class::Std module provides exactly the same function
named ident (since the memory address is used as an identifier for the
object).
We now have enough information to build ourselves our very own inside-out object. Imagine a playing card as an object: it would have a suit and a face value (rank).
package PlayingCard;
use strict;
use warnings;
use Scalar::Util qw/refaddr/;
# Using an enclosing block ensures that the attributes declared
# are *only* accessible inside the same block. This is only really
# necessary for files with more than one class defined in them.
{
my %suit_of;
my %rank_of;
sub new {
my ($class, $rank, $suit) = @_;
# This strange looking line produces an
# anonymous blessed scalar.
my $this = bless \do{my $anon_scalar}, $class;
# Attributes are stored in their respective
# hashes. We should also be checking that
# $suit and $rank contain acceptable values for
# our class.
$suit_of{refaddr $this} = $suit;
$rank_of{refaddr $this} = $rank;
return $this;
}
sub get_suit {
my ($this) = @_;
return $suit_of{refaddr $this};
}
sub get_rank {
my ($this) = @_;
return $rank_of{refaddr $this};
}
}
1;
One of the strangest lines in our code contains \do{my $anon_scalar} .
This odd construct simply declares a lexical variable using my. The
name of our scalar is irrelevant, since it immediately goes out of
scope at the end of the block. Normally this would seem fruitless,
but the enclosing do {} block returns the last statement evaluated,
in our case the freshly created scalar. By taking a reference to this
scalar (using the backslash operator) our scalar avoids destruction
and lives on without a name.
Note that our scalar itself is completely empty, it doesn't contain anything, and we never use its contents. It exists simply to be blessed into the appropriate class, and for our own code to use its memory address for attribute lookups.
Inside-out objects compare favourably with regular objects. They scale better in terms of memory usage, and with minor modifications can be tuned to provide even faster performance, albeit with the loss of some integrity benefits. However you're unlikely to notice these benefits unless it's absolutely critical that your application needs to run very fast or very small. So what's the catch?
Think about what happens when an object is destroyed. With a regular
hash-based object the only reference to the object's attributes is
lost with the object itself, and Perl handles the clean-up for us.
When we have an inside-out object, nothing cleans up the attributes
when the object is destroyed. Instead, we have to write our own
DESTROY method. We also need to worry about making sure our parent
and sibling DESTROY methods are called as well. If we don't,
then our objects will leak memory, and that's bad.
For our PlayingCard we would need to add the following, or
code like it:
use NEXT;
sub DESTROY {
my ($this) = @_;
$this->EVERY::_destroy;
}
sub _destroy {
my ($this) = @_;
delete $suit_of{ident $this};
delete $rank_of{ident $rank};
}
All our derived classes will need to write their own _destroy method
to clean up any additional attributes that have been defined.
An additional advantage of inside-out objects is that each class has its own private area in which to store attributes. This means that derived classes don't need to worry about clashes with parents or siblings, and vice-versa. It also makes it possible, although possibly unwise, for derived classes to have attributes of the same name, but with different values (something which is impossible for standard hash-based objects).
If we do decide to use attributes of the same name in more than one class in our inheritance tree, we need to think about how we will ensure that each class gets the correct value during construction and initialisation. The best way to do this depends on our implementation.
Inside-out objects do not avoid the problems associated with multiple
methods in the inheritance tree having the same names. Fortunately,
we can use NEXT in such situations, just as we do with standard
hash-based inheritance.
The basic structure of any inside-out object is essentially the same, just as the basic structure for hash-based objects is essentially the same. As such a number of builder modules have been created to remove the repetitive code and make it quicker for you to start writing the real code. Two particularly good modules for inside-out objects are:
Class::Std
Object::InsideOut
[ Perl tips index ]
[ Subscribe to Perl tips ]
| Location | Course | Course Date | Duration | Early Bird Date |
|---|---|---|---|---|
| Melbourne | Programming Perl | Tue 2 Sep 2008 | 4 days | Mon 4 Aug 2008 |
| Sydney | Programming Perl | Tue 7 Oct 2008 | 4 days | Mon 8 Sep 2008 |
| Canberra | Programming Perl | Mon 24 Nov 2008 | 4 days | Mon 27 Oct 2008 |
For future dates, please see our training calendar.
This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.
If you have any questions please don't hesitate to contact us:
| Email: | contact@perltraining.com.au |
| Phone: | 03 9354 6001 (Australia) |
| International: | +61 3 9354 6001 |
Copyright 2001-2008 Perl Training Australia. Contact us at contact@perltraining.com.au