Five Things You Didn't Know You Could Do with Perl - ' Cataloging PDFs ' (
Page 2 of 6 )
Even after all these years, Perl still strikes many people as being a language primarily for processing and parsing data and textual information, whether that's offline or online through a Web site.
However, despite its history and more common and widespread uses, Perl is a very capable general purposes language through which we can perform a wide range of tasks. In this article, I show you some of the less obvious uses of Perl, some of which may surprise you.
Cataloging PDFs
ADVERTISEMENT
I use Acrobat documents a lot. As a writer, vast quantities of information is exchanged between companies and organizations in PDF format, and I also create my own PDFs of other documents and Web sites. I do this because they are easy to use across a range of different machines and, more importantly, easier to search using the Acrobat catalog extension. Acrobat documents also all have properties which allow you to describe the content, author, subject matter, and a number of keywords. You can use this information to help catalog your documents, and it also becomes a handy way of identifying files when using the built-in search system.
For additional convenience, I also put my Acrobat documents on a Web server, so that I can access them from any machine without having to worry about mounting the volume. Rather than using the file name information, which is not always helpful, I use a small Perl CGI script which extracts the Acrobat property information and uses this to list the files to make them easy to locate.
You can see a simpler version of the script below. It uses the PDF::API2 module to load the PDF document and extract the property information. We then combine this information into a single hash and print out the information in the form of an HTML page and table.
You'll see a sample of this in action in the figure. For convenience, the script also provides a direct link to the PDF so that if necessary I can view the PDF within my browser just by clicking on it. The headings of the table are also clickable; click on the header to order the files by that property.