Monday, May 15, 2006

book: Intermediate Perl


Intermediate Perl

Chapter 1. Introduction

Welcome to the next step in your understanding of Perl. You're probably here either because you want to learn to write programs that are more than 100 lines long or because your boss has told you to do so.

See, our Learning Perl book was great because it introduced the use of Perl for short and medium programs (which is most of the programming done in Perl, we've observed). But, to avoid having "the Llama book" be big and intimidating, we left a lot of information out, deliberately and carefully.

In the pages that follow, you can get "the rest of the story" in the same style as our friendly Llama book. It covers what you need to write programs that are 100 to 10,000 lines long.

For example, you'll learn how to work with multiple programmers on the same project. This is great, because unless you work 35 hours each day, you'll need some help with larger tasks. You'll also need to ensure that all your code fits with the other code as you develop it for the final application.

This book will also show you how to deal with larger and more complex data structures , such as what we might casually call a "hash of hashes" or an "array of arrays of hashes of arrays." Once you know a little about references, you're on your way to arbitrarily complex data structures.

And then there's the buzzworthy notion of object-oriented programming (OOP), which allows parts of your code (or hopefully code from others) to be reused with minor or major variations within the same program. The book will cover that as well, even if you've never seen objects before.

An important aspect of working in teams is having a release cycle and tests for unit and integration testing . You'll learn the basics of packaging your code as a distribution and providing unit tests for that distribution, both for development and for verifying that your code works in the ultimate end environment.

And, just as was promised and delivered in Learning Perl, we'll entertain you along the way with interesting examples and bad puns. (We've sent Fred, Barney, Betty, and Wilma home, though. A new cast of characters will take the starring roles.)


1.1. What Should You Know Already?

We'll presume that you've already read Learning Perl, or at least pretend you have, and that you've played enough with Perl to already have those basics down. For example, you won't see an explanation in this book that shows how to access the elements of an array or return a value from a subroutine.

Make sure you know the following things:

* How to run a Perl program on your system
* The three basic Perl variable types: scalars, arrays, and hashes
* Control structures such as while, if, for, and foreach
* Subroutines
* Perl operators such as grep, map, sort, and print
* File manipulation such as open, file reading, and -X (file tests)

You might pick up deeper insight into these topics in this book, but we're going to presume you know the basics.


1.2. What About All Those Footnotes?

Like Learning Perl, this book relegates some of the more esoteric items out of the way for the first reading and places those items in footnotes.[*] You should skip those the first time through and pick them up on a rereading. You will not find anything in a footnote that you'll need to understand any of the material we present later.


1.3. What's with the Exercises?

Hands-on training gets the job done better. The best way to provide this training is with a series of exercises after every half-hour to hour of presentation. Of course, if you're a speed reader, the end of the chapter may come a bit sooner than a half hour. Slow down, take a breather, and do the exercises!

Each exercise has a "minutes to complete" rating. We intend for this rating to hit the midpoint of the bell curve, but don't feel bad if you take significantly longer or shorter. Sometimes it's just a matter of how many times you've faced similar programming tasks in your studies or jobs. Use the numbers merely as a guideline.

Every exercise has its answer in the Appendix. Again, try not to peek; you'll ruin the value of the exercise.


1.4. What If I'm a Perl Course Instructor?

If you're a Perl instructor who has decided to use this as your textbook, you should know that each set of exercises is short enough for most students to complete in 45 minutes to an hour, with a little time left over for a break. Some chapters' exercises should be quicker, and some may take longer. That's because once all those little numbers in square brackets were written, we discovered that we didn't know how to add.

So let's get started. Class begins after you turn the page . . . .



Chapter 2. Intermediate Foundations

Before we get started on the meat of the book, we want to introduce some intermediate-level Perl idioms that we use throughout the book. These are the things that typically set apart the beginning and intermediate Perl programmers. Along the way, we'll also introduce you to the cast of characters that we'll use in the examples throughout the book.





2.4. Exercises

You can find the answers to these exercises in "Answers for Chapter 2" in the Appendix.
2.4.1. Exercise 1 [15 min]

Write a program that takes a list of filenames on the command line and uses grep to select the ones whose size in bytes is less than 1000. Use map to transform the strings in this list, putting four space characters in front of each and a newline character after. Print the resulting list.
2.4.2. Exercise 2 [25 min]

Write a program that asks the user to enter a pattern (regular expression). Read this as data from the keyboard; don't get it from the command-line arguments. Report a list of files in some hardcoded directory (such as "/etc" or 'C:\\Windows') whose names match the pattern. Repeat this until the user enters an empty string instead of a pattern. The user should not type the forward slashes that are traditionally used to delimit pattern matches in Perl; the input pattern is delimited by the trailing newline. Ensure that a faulty pattern, such as one with unbalanced parentheses, doesn't crash the program.


Chapter 3. Using Modules

Modules are the building blocks for our programs. They provide reusable subroutines, variables, and even object-oriented classes. On our way to building our own modules , we'll show you some of those you might be interested in. We'll also look at the basics of using modules that others have already written.


3.1. The Standard Distribution

Perl comes with many of the popular modules already. Indeed, most of the 50+ MB of the most recent distribution are from modules. In October 1996, Perl 5.003_07 had 98 modules. Today, at the beginning of 2006, Perl 5.8.8 has 359.[*] Indeed, this is one of the advantages of Perl: it already comes with a lot of stuff that you need to make useful and complex programs without doing a lot of work yourself.

[*] After you make it through this book, you should be able to use Module::CoreList to discover that count for yourself. That's what we did to get those numbers, after all.

Throughout this book, we'll try to identify which modules comes with Perl (and in most cases, with which version they started coming with Perl). We'll call these "core modules " or note that they're in "the standard distribution ." If you have Perl, you should have these modules. Since we're using Perl 5.8.7 as we write this, we'll assume that's the current version of Perl.

As you develop your code, you may want to consider if you want to use only core modules, so that you can be sure that anyone with Perl will have that module as long as they have at least the same version as you.[] We'll avoid that debate here, mostly because we love CPAN too much to do without it.

[] Although we don't go into here, the Module::CoreList module has the lists of which modules came with which versions of Perl, along with other historical data.





3.2. Using Modules

Almost every Perl module comes with documentation, and even though we might not know how all of the behind-the-scenes magic works, we really don't have to worry about that stuff if we know how to use the interface. That's why the interface is there, after all: to hide the details.

On our local machine, we can read the module documentation with the perldoc command. We give it the module name we're interested in, and it prints out its documentation.

$ perldoc File::Basename

NAME
fileparse - split a pathname into pieces

basename - extract just the filename from a path

dirname - extract just the directory from a path

SYNOPSIS
use File::Basename;

($name,$path,$suffix) = fileparse($fullname,@suffixlist)
fileparse_set_fstype($os_string);
$basename = basename($fullname,@suffixlist);
$dirname = dirname($fullname);


We've included the top portion of the documentation to show you the most important section (at least, the most important when you're starting). Module documentation typically follows the old Unix manpage format, which starts with a NAME and SYNOPSIS section.

The synopsis gives us examples of the module's use, and if we can suspend understanding for a bit and follow the example, we can use the module. That is to say, it may be that you're not yet familiar with some of the Perl techniques and syntax in the synopsis, but you can generally just follow the example and make everything work.

Now, since Perl is a mix of procedural, functional, object-oriented, and other sorts of language types, Perl modules come in a variety of different interfaces. We'll employ these modules in slightly different fashions, but as long as we can check the documentation, we shouldn't have a problem.


3.3. Functional Interfaces

To load a module, we use the Perl built-in use. We're not going to go into all of the details here, but we'll get to those in Chapters 10 and 15. At the moment, we just want to use the module. Let's start with File::Basename, that same module from the core distribution. To load it into our script, we say:

use File::Basename;


When we do this, File::Basename introduces three subroutines, fileparse, basename, and dirname,[*] into our script.[] From this point forward, we can say:

[*] As well as a utility routine, fileparse_set_fstype.

[] Actually, it imports them into the current package, but we haven't told you about those yet.

my $basename = basename( $some_full_path );
my $dirname = dirname( $some_full_path );


as if we had written the basename and dirname subroutines ourselves, or (nearly) as if they were built-in Perl functions. These routines pick out the filename and the directory parts of a pathname. For example, if $some_full_path were D:\Projects\Island Rescue\plan7.rtf (presumably, the program is running on a Windows machine), then $basename would be plan 7.rtf and the $dirname would be D:\Projects\Island Rescue.

The File::Basename module knows what sort of system it's on, and thus its functions figure out how to correctly parse the strings for the different delimiters we might encounter.

However, suppose we already had a dirname subroutine. We've now overwritten it with the definition provided by File::Basename! If we had turned on warnings, we would have seen a message stating that; but otherwise, Perl really doesn't care.




3.4. Selecting What to Import

Fortunately, we can tell the use operation to limit its actions by specifying a list of subroutine names following the module name, called the import list:

use File::Basename ('fileparse', 'basename');


Now the module only gives us those two subroutines and leaves our own dirname alone. Of course, this is awkward to type, so more often we'll see this written with the quotewords operator:

use File::Basename qw( fileparse basename );


In fact, even if there's only one item, we tend to write it with a qw( ) list for consistency and maintenance; often we'll go back to say "give me another one from here," and it's simpler if it's already a qw( ) list.

We've protected the local dirname routine, but what if we still want the functionality provided by File::Basename's dirname? No problem. We just spell it out with its full package specification:

my $dirname = File::Basename::dirname($some_path);


The list of names following use doesn't change which subroutines are defined in the module's package (in this case, File::Basename). We can always use the full name regardless of the import list, as in:[*]

[*] You don't need the ampersand in front of any of these subroutine invocations, because the subroutine name is already known to the compiler following use.

my $basename = File::Basename::basename($some_path);


In an extreme (but extremely useful) case, we can specify an empty list for the import list, as in:

use File::Basename ( ); # no import
my $base = File::Basename::basename($some_path);


An empty list is different from an absent list. An empty list says "don't give me anything," while an absent list says "give me the defaults." If the module's author has done her job well, the default will probably be exactly what we want.





3.5. Object-Oriented Interfaces

Contrast the subroutines imported by File::Basename with what another core module has by looking at File::Spec. The File::Spec module is designed to support operations commonly performed on file specifications. (A file specification is usually a file or directory name, but it may be a name of a file that doesn't exist—in which case, it's not really a filename, is it?)

Unlike the File::Basename module, the File::Spec module has a primarily objectoriented interface. We load the module with use, as we did before.

use File::Spec;


However, since this module has an object-oriented interface,[] it doesn't import any subroutines. Instead, the interface tells us to access the functionality of the module using its class methods. The catfile method joins a list of strings with the appropriate directory separator:

[] We can use File::Spec::Functions if we want a functional interface.

my $filespec = File::Spec->catfile( $homedir{gilligan},
'web_docs', 'photos', 'USS_Minnow.gif' );


This calls the class method catfile of the File::Spec class, which builds a path appropriate for the local operating system and returns a single string.[] This is similar in syntax to the nearly two dozen other operations provided by File::Spec.

[] That string might be something like /home/gilligan/web_docs/photos/USS_Minnow.gif on a Unix system. On a Windows system, it would typically use backslashes as directory separators . This module lets us write portable code easily, at least where file specs are concerned.

The File::Spec module provides several other methods for dealing with file paths in a portable manner. You can read more about portability issues in the perlport documentation.



3.6. A More Typical Object-Oriented Module: Math::BigInt

So as not to get dismayed about how "un-OO" the File::Spec module seems since it doesn't have objects, let's look at yet another core module, Math::BigInt, which can handle integers beyond Perl's native reach.[*]

[*] Behind the scenes, Perl is limited by the architecture it's on. It's one of the few places where the hardware shows through.

use Math::BigInt;

my $value = Math::BigInt->new(2); # start with 2

$value->bpow(1000); # take 2**1000

print $value->bstr( ), "\n"; # print it out


As before, this module imports nothing. Its entire interface uses class methods, such as new, against the class name to create instances, and then calls instance methods, such as bpow and bstr, against those instances.




3.7. The Comprehensive Perl Archive Network

CPAN is the result of many volunteers working together, many of whom were originally operating their own little (or big) Perl FTP sites back before that Web thing came along. They coordinated their efforts on the perl-packrats mailing list in late 1993 and decided that disk space was getting cheap enough that the same information should be replicated on all sites rather than having specialization on each site. The idea took about a year to ferment, and Jarkko Hietaniemi established the Finnish FTP site as the CPAN mothership from which all other mirrors could draw their daily or hourly updates.

Part of the work involved rearranging and organizing the separate archives. Places were established for Perl binaries for non-Unix architectures, scripts, and Perl's source code itself. However, the modules portion has come to be the largest and most interesting part of the CPAN.

The modules in CPAN are organized as a symbolic-link tree in hierarchical functional categories, pointing to author directories where the actual files are located. The modules area also contains indices that are generally in easy-to-parse-with-Perl formats, such as the Data::Dumper output for the detailed module index. Of course, these indices are all derived automatically from databases at the master server using other Perl programs. Often, the mirroring of the CPAN from one server to another is done with a now-ancient Perl program called mirror.pl.

From its small start of a few mirror machines, CPAN has now grown to over 200 public archives in all corners of the Net, all churning away, updating at least daily, sometimes as frequently as hourly. No matter where we are in the world, we can find a nearby CPAN mirror from which to pull the latest goodies.

The incredibly useful CPAN Search (http://search.cpan.org) will probably become your favorite interface. From that web site, you can search for modules, look at their documentation, browse through their distributions, inspect their CPAN Testers reports, and do many other things.



3.8. Installing Modules from CPAN

Installing a simple module from CPAN can be straightforward: we download the module distribution archive, unpack it, and change into its directory. We use wget here, but it doesn't matter which tool you use.

$ wget http://www.cpan.org/.../HTTP-Cookies-Safari-1.10.tar.gz
$ tar -xzf HTTP-Cookies-Safari-1.10.tar.gz
$ cd HTTP-Cookies-Safari-1.10s


From there we go one of two ways (which we'll explain in detail in Chapter 16). If we find a file named Makefile.PL, we run this series of commands to build, test, and finally install the source:

$ perl Makefile.PL
$ make
$ make test
$ make install


If we don't have permission to install modules in the system-wide directories,[*] we can tell Perl to install them under another path by using the PREFIX argument:

[*] These directories were set when the administrator installed Perl, and we can see them with perl -V.

$ perl Makefile.PL PREFIX=/Users/home/Ginger


To make Perl look in that directory for modules, we can set the PERL5LIB environment variable. Perl adds those directories to its module directory search list.

$ export PERL5LIB=/Users/home/Ginger


We can also use the lib pragma to add to the module search path, although this is not as friendly, since we have to change the code, but also because it might not be the same directory on other machines where we want to run the code.

#!/usr/bin/perl
use lib qw(/Users/home/Ginger);


Backing up for a minute, if we found a Build.PL file instead of a Makefile.PL, the process is the same. These distributions use Module::Build to build and install code. Since Module::Build is not a core Perl module,[*] we have to install it before we can install the distribution that needs it.

[*] At least not yet. It should be part of Perl 5.10, though.

$ perl Build.PL
$ perl Build
$ perl Build test
$ perl Build install


To install into our private directories using Module::Build, we add the —install_base parameter. We tell Perl how to find modules the same way we did before.

$ perl Build.PL --install_base /Users/home/Ginger


Sometimes we find both Makefile.PL and Build.PL in a distribution. What do we do then? We can use either one. Play favorites, if you like.





3.10. Exercises

You can find the answers to these exercises in "Answers for Chapter 3" in the Appendix.
3.10.1. Exercise 1 [25 min]

Read the list of files in the current directory and convert the names to their full path specification. Don't use the shell or an external program to get the current directory. The File::Spec and Cwd modules, both of which come with Perl, should help. Print each path with four spaces before it and a newline after it, just like you did for Exercise 1 of Chapter 2. Can you reuse part of that answer for this problem?
3.10.2. Exercise 2 [35 min]

Parse the International Standard Book Number from the back of this book (0596102062). Install the Business::ISBN module from CPAN and use it to extract the country code and the publisher code from the number.

No comments:

Digg / Technology

Blog Archive