Tech Blog: perl DBI

Chapter 2
delimited or fixed width flat files
non DBI modules: AnyDBM_File, Storable, Data::Dumper

Chapter 7
discusses the differences in design between DBI and ODBC, the other portable database API.

Chapter 8
This chapter covers two topics that aren't exactly part of the core DBI, per se, but are extremely useful to know about
DBI shell
proxy architecture: you can connect a script running on a Unix box to a Microsoft Access database running on a Microsoft Windows box

Appendix B
contains useful extra information on each of the commonly used DBDs and their corresponding databases.

Resources
http://dbi.perl.org
http://www.perl.com/CPAN
Learning Perl, by Randal Schwartz and Tom Christiansen
Programming Perl, by Larry Wall, Tom Christiansen, and Randal Schwartz : The authoritative guide to Perl version 5
The Perl Cookbook, by Tom Christiansen and Nathan Torkington : long-awaited companion volume to Programming Perl
Writing Apache Modules with Perl and C, by Lincoln Stein and Doug MacEachern : Apache::DBI module, which provides advanced DBI functionality in relation to web services such as persistent connection pooling optimized for serving databases over the Web.

http://www.oreilly.com/catalog/perldbi/ # examples, errata

Chapter 1: Introduction

each development team choosing their preferred development environment (such as Visual Basic, PowerBuilder, Oracle Pro*C, Informix E/SQL, C++ code with ODBC—the list is almost endless). Database-independent programming interfaces.

Another aspect of today's corporate database lifestyle revolves around the idea of data warehousing , that is, creating and building vast repositories of archived information that can be scanned, or mined, for information separately from online databases

Perl takes the pain of manipulating strings out of programming, unlike C
"Swiss army knife of programming languages."

1.3. DBI in the Real World
We developed and support a large scale telephone call logging and analysis system for a major client of ours. The system collects ~1 GB of call data per day from over 1,200,000 monitored phone numbers. ~424 GB has been processed so far (over 6,200,000,000 calls). Data is processed and loaded into Oracle using DBI and DBD::Oracle. The database holds rolling data for around 20 million calls. The system generates over 44,000 PostScript very high quality reports per month (~five pages with eleven color graphs and five tables) generated by using Perl to manipulate FrameMaker templates. [Values correct as of July 1999, and rising steadily.] The whole system runs on three dual processor Sun SPARC Ultra 2 machines—one for data acquisition and processing, one for Oracle and the third does most of the report production (which is also distributed across the other two machines).

In 1997 I built a system for NASA's Langley Research Center in Virginia that puts a searchable web front end on a database of about 100,000 NASA-owned equipment items. I used Apache, DBI, Informix, WDB, and mod_perl on a Sparc 20

They have 30,000 employees world-wide who needed a secure system for getting to web-based resources. This first iteration of the Intranet is specified to handle up to forty requests for web objects per second (approximately 200 concurrent users), and runs on a single processor Intel Pentium-Pro with 512 megs of RAM. , we have developed a large reusable library of Perl code. . One of our most useful modules builds an Object-Relational wrapper around DBI to allow our application developers to talk to the database using O-O methods to access or change properties of the record

using Perl with DBI and DBD-Oracle as part of web-based reporting for significant portions of the manufacturing and distribution organizations. The use of DBI/DBD-Oracle is part of a movement away from Oracle Forms based reporting to a pure web-based reporting platform.

Chapter 2. Basic Non-DBI Databases

DBI supports relational, not Object Oriented

2.1. Storage Managers and Layers
flat-file databases are composed of pools of data with very few layers of abstraction.
DBM file libraries, like Berkeley DB, are an example of a storage manager layer that sits on top of the raw data files
You could potentially implement a more powerful database system on top of DBM files.

2.2. Query Languages and Data Functions
C.R.U.D. (Create, Read, Update, Delete).

2.3. Standing Stones and the Sample Database
Name: Stonehenge
Location: Wiltshire, England
Map Reference: SU 123 400
Type: Stone Circle and Henge
Description: The most famous megalithic site in the world, comprised of an earthen bank, or henge, and several concentric rings of massive standing stones formed into trilithons.

`tell me of all the megalithic sites in Wiltshire,'' or ``tell me about all the standing stones in Orkney,'' and so on.

the simplest form of database that you might wish to use: the flat-file database.

2.4. Flat-File Databases
Modifications to the data are usually done by updating an in-memory copy of the data held in the file, or files, then writing the entire set of data back out to disk.

two main types of flat-file database: files that separate fields with a delimiter character, and files that allocate a fixed length to each field.

2.4.1. Querying Data
Usage: scanmegadata

open MEGADATA, "<$megalithFile" or die "Can't open $megalithFile: $!\n"; while ( ) {
( $name, $location, $mapref, $type, $description ) = split( /:/, $_ );
if ( $name eq $siteName ) { $found = $. ; last; } # hold current line number
if ($found)
print

Using fixed-width fields is similar to the way in which data is organized in more powerful database systems such as an RDBMS.
eliminiate hassle of parse char in data
instead of split, use unpack
( $name, $location, $mapref, $type, $description ) = unpack( "A64 A64 A16 A32 A256", $_ );

2.4.2. Inserting Data
Inserting data into a flat-file database is very straightforward and usually amounts to simply tacking the new data onto the end of the data file.
Usage: insertmegadata" " \n"

open MEGADATA, ">>$megalithFile" or die
my $record = join( ":", $siteName, $siteLocation, $siteMapRef,
print MEGADATA "$record\n"

does not detect duplicates
does not handle concurency

Inserting new records into a fixed-length data file is also simple. Instead of printing each field to the Perl filehandle separated by the delimiting character, we can use the pack() function to create a fixed-length record out of the data.

2.4.3. Updating Data
The main problem with updating data is that we need to be able to read in data from the data file, temporarily mess about with it, and write the database back out to the file without losing any records.

One approach is to slurp the entire database into memory, make any updates to the in-memory copy, and dump it all back out again. A second approach is to read the database in record by record, make any alterations to each individual record, and write the record immediately back out to a temporary file. Once all the records have been processed, the temporary file can replace the original data file. Both techniques are viable, but we prefer the latter for performance reasons.

"Usage: updatemegadata \n"
open MEGADATA
open TMPMEGADATA
while () {
next unless m/^\Q$siteName:/;
my ($name, ...) = split(
$mapref = $siteMapRef;
$_ = join( ":", $name,
continue {
print TMPMEGADATA
} # not sure about this script, where does it write out unmodified stuff?
unlink $megalithFile
rename $tempFile, $megalithFile

I tested the code above. The while ... continue is special function
http://www.tizag.com/perlT/perlwhile.php
the continue block is hit every time
fucking confusing

An equivalent program that can be applied to a fixed-length file is very similar
while ( ) {
next unless unpack( "A64", $_ ) eq $siteName;
substr( $_, 64+64, 16) = pack( "A16", $siteMapRef );

2.4.4. Deleting Data
We shall process the file a record at a time by passing the data through a temporary file, just as we did for updating, rather than slurping all the data into memory and dumping it at the end. With this technique, the action of removing a record from the database is more an act of omission than any actual deletion.

Usage: deletemegadata \n"
while ( ) {
if ( $siteName eq $name ) {
next
print TMPMEGADATA $

The code to remove records from a fixed-length data file is almost identical. The only change is in the code to extract the field value, as you'd expect:

Tech Blog

Tuesday, February 20, 2007

perl DBI

No comments:

Digg / Technology

Blog Archive

About Me