Languages - DevSource
DevSource: Microsoft Developer Resource DevSource Home Sponsored by Microsoft Home Add Ons Architecture Languages Techniques Using VS Forums
Home arrow Languages arrow Page 2 - Five Things You Didn't Know You Could Do with Perl
Five Things You Didn't Know You Could Do with Perl
By Martin Brown

Rate This Article: Add This Article To:

Five Things You Didn't Know You Could Do with Perl - ' Cataloging PDFs '
( Page 2 of 6 )

Even after all these years, Perl still strikes many people as being a language primarily for processing and parsing data and textual information, whether that's offline or online through a Web site.

However, despite its history and more common and widespread uses, Perl is a very capable general purposes language through which we can perform a wide range of tasks. In this article, I show you some of the less obvious uses of Perl, some of which may surprise you.

Cataloging PDFs
ADVERTISEMENT

I use Acrobat documents a lot. As a writer, vast quantities of information is exchanged between companies and organizations in PDF format, and I also create my own PDFs of other documents and Web sites. I do this because they are easy to use across a range of different machines and, more importantly, easier to search using the Acrobat catalog extension. Acrobat documents also all have properties which allow you to describe the content, author, subject matter, and a number of keywords. You can use this information to help catalog your documents, and it also becomes a handy way of identifying files when using the built-in search system.

For additional convenience, I also put my Acrobat documents on a Web server, so that I can access them from any machine without having to worry about mounting the volume. Rather than using the file name information, which is not always helpful, I use a small Perl CGI script which extracts the Acrobat property information and uses this to list the files to make them easy to locate.

You can see a simpler version of the script below. It uses the PDF::API2 module to load the PDF document and extract the property information. We then combine this information into a single hash and print out the information in the form of an HTML page and table.

use PDF::API2;
use Data::Dumper;
use CGI qw/:standard/;
use strict;
use warnings;


my $info = {}; 

foreach my $file (glob("*.pdf"))
{
    my $pdf = PDF::API2->open($file);

    my %infohash = $pdf->info();

    $info->{$file} = \%infohash;
}

my $sortby = param('sortby');

$sortby = 'Title' unless ($sortby =~ m/(Title|Author|Subject)/);

print header('text/html');

print <<EOF;
<html>
<head>
<title>PDF Library</title>
</head>
<body bgcolor="white" fgcolor="black">
<h1>PDF Library</h1>
<table cellpadding=5 cellspacing=5 border=0>
<tr>
<td><b><a href="/dumppdf.cgi?sortby=Title">Title</a></b></td>
<td><b><a href="/dumppdf.cgi?sortby=Author">Author</a></b></td>
<td><b><a href="/dumppdf.cgi?sortby=Subject">Subject</a></b></td>
</tr>
EOF

foreach my $file (sort { $info->{$a}->{$sortby} cmp $info->{$b}->{$sortby} } keys %{$info})
{
    printf('<tr><td><a href="%s">%s</a></td><td>%s</td><td>%s</td></tr>',
           $file,
           (defined($info->{$file}->{Title}) ? $info->{$file}->{Title} : 'No Title'),
           (defined($info->{$file}->{Author}) ? $info->{$file}->{Author} : 'No Author'),
           (defined($info->{$file}->{Subject}) ? $info->{$file}->{Subject} : 'No Subject'),
          );
}

print <<EOF;
</table>
</body>
</html>
EOF

You'll see a sample of this in action in the figure. For convenience, the script also provides a direct link to the PDF so that if necessary I can view the PDF within my browser just by clicking on it. The headings of the table are also clickable; click on the header to order the files by that property.



 
 
>>> More Languages Articles          >>> More By Martin Brown
 



Microsoft's Future: A Chat With Their CTO, Barry Briggs

Play Video >

All Videos >

Julia explores the Robotics Studio!

Read now >

Messages to Bill Gates!

Read now >

View Now
DevSource RSS FEEDS
XML Want an easy way to keep up with breaking tech news? And the Get DevSource headlines delivered to your desktop with RSS.