my solution for manipulating PDF
For the projects we do, we frequently need to generate a PDF file that consists of PDF files (submitted to us by proposal/paper authors) and dynamic data (from other Zope objects). Before we started The Great Zope Migration, we used LaTeX to generate the dynamic pages and then glued them together with pjscript. http://www.etymon.com/pj/pjscript.html I've been trying to find a good way to do this under Zope. I have looked at PDFlib. http://www.pdflib.com/pdflib/ I like that it has Python bindings, but for manipulation of existing PDF, we would have to purchase its sister product, PDI. More recently, I was pointed at ReportLab. http://www.reportlab.com/ This is even better; it's written in Python. (I see that it's being used by other Zopers, too.) Unfortunately, it too requires another product, PageCatcher http://www.reportlab.com/pageCatcher/ to do the things I want to do. It would be a slick integrated solution, but I might end up sing it without PageCatcher. We have money to spend, but I'm dead set against using Closed solutions right now, so neither of these appealed to me enough. Instead, I decided to fall back on pjscript. Cameron Laird helped me get started with a simple PJ document class. I made it work as a Zope External Method for trivial purposes, but then I rewrote most of it today so that I can do most everything that pjscript offers. (I'll include PJ.py, the Zope extension at the end of this message.) I was surprised at how easy it was to do this. Here's how I use it from a Python Script: document = container.WRRC.Ztools.PJdoc() # Sandwich the NYT fax between two simple pages. document.readpdf(container['simple.pdf'].data) document.appendpdf(container['nytfax.pdf'].data) document.appendpdf(container['simple.pdf'].data) # Remove the first page of the NYT fax. document.deletepage(2) # Make an 'X'. document.setpage(2) document.drawline((0,0), (500,500), 5) document.drawline((0,500), (500,0), 5) # Throw text around. document.setpage(1) document.initxy() document.drawtext(text='upper left', font='Courier-BoldOblique', fontsize=8) document.drawtext(text='PJdoc test', font='Helvetica-Bold', fontsize=16, pos=(50, 600)) document.drawtext('Howdy!', fontsize=90) # Write lines of text. document.setinit((200,400)) document.initxy() document.drawtext('one') document.nextxy() document.drawtext('two') document.nextxy() document.drawtext('three') # Set the resulting document's info. document.setinfo('Author', 'Kyler Laird') document.setinfo('Keywords', 'foo blah test NYTimes') context.REQUEST.RESPONSE.setHeader('Content-type', 'application/pdf') return document.writepdf() Although I really balked at using system() calls into pjscript (wanting to go straight to the PJ API through some Python/Java wizardry) for this, it's hardly noticeable at this level. I like the solution. I'm almost confident that I can safely encourage its wide use here. Next I'm going to work on extensions for Ghostscript (for PS->PDF conversion), LaTeX and html2ps. I'll probably revisit ReportLab, too. I only offer this because I'm guessing someone else might travel down this road someday and it could help a bit. I'd put it somewhere more permanent, but I'm not willing to commit to its correctness nor to its maintenance right now. Please smack me if I'm out of line posting such things here. Thank you. --kyler =================================================== PJ.py =================================================== import tempfile import os import sys import string # Return a PJdoc object to caller. def PJdoc(): return _PJdoc() class _PJdoc: # Allow Zope users to access PJdoc's methods. __allow_access_to_unprotected_subobjects__=1 # CL intends to re-sort these def-s into utilities and publics. def __init__(self): # the PJ script I'm building self._script = '' # Keep track of temporary files. self.tmpfiles = [] def __del__(self): # Clean up my temporary files. for file in self.tmpfiles: os.unlink(file) # Put a string in a temporary file. # Keep track of it so we can delete it when # this object is destroyed. def _write_string_to_tmpfile(self, text): filename = tempfile.mktemp("pjs") # Add to list of temporary files. self.tmpfiles.append(filename) file = open(filename, "w") # I had problems just writing everything # at once, so now I write in 1K chunks. start = 0 textlen = len(text) bufsiz = 1024 while 1: end = start + bufsiz if (end >= textlen): file.write(text[start:]) break else: file.write(text[start:end]) start = end file.close() # Tell caller the name of the file used. return filename # Display our accumulated PJ script. def show_script(self): return self._script # Run the PJ script. def _run(self): tmpfile = self._write_string_to_tmpfile(self.show_script()) command_string = "/usr/bin/env pjscript %s" % tmpfile result = os.system(command_string) if result != 0: report = "Failure with '%s'." % command_string raise report # Add a command to the PJ script. def _do(self, string): self._script = self._script + string + '\n' # Add a command that reads from a file to the PJ script. def _read_file_command(self, command, string): tmpfile = self._write_string_to_tmpfile(string) self._do('$file %s' % tmpfile) self._do('%s' % command) # Add a command that writes to a file to the PJ script. def _write_file_command(self, command): # I will handle destroying this. tmpfile = tempfile.mktemp("pdf") # Set "file" to point at the temprorary file. self._do('$file %s\n' % tmpfile) # Do the command. self._do('%s\n' % command) # Run pjscript with the current script. # There are better ways to handle this? self._run() # Read the result. file = open(tmpfile, "r") string = file.read() file.close # Return the text of the output file to # the caller. return string # Set a PJ script variable. def _set_variable(self, var, val): if val is None: return # Everything is a string to pjscript. val = str(val) # Make sure val isn't screwy. # If someone got a newline in, an arbitrary command # could be executed. if (string.find(val, '\n') != -1 or string.find(val, '\r') != -1): report = "Invalid value: '%s'." % val raise report # To store "data" in "x", do "$x data". self._do('$%s %s\n' % (var, val)) # Kyler added this. def setpage(self, page=None): self._set_variable('page', page) # Kyler added this. def setinit(self, pos): (x, y) = pos self._set_variable('xinit', x) self._set_variable('yinit', y) # For commands below, see # http://www.etymon.com/pj/pjscript.html # Note that I've handled x, y pairs as position tuples. def appendpdf(self, pdfstring): self._read_file_command(command='appendpdf', string=pdfstring) def deletepage(self, page=None): self._set_variable('page', page) self._do('deletepage') def drawline(self, start, end, linewidth=None): (x0, y0) = start (x1, y1) = end self._set_variable('x0', x0) self._set_variable('y0', y0) self._set_variable('x1', x1) self._set_variable('y1', y1) self._set_variable('linewidth', linewidth) self._do('drawline') def drawtext(self, text, font=None, fontsize=None, page=None, pos=None): self._set_variable('text', text) self._set_variable('font', font) self._set_variable('fontsize', fontsize) if pos is not None: (x, y) = pos self._set_variable('x', x) self._set_variable('y', y) self._set_variable('page', page) self._do('drawtext') def initxy(self): self._do('initxy') def newpdf(self): self._do('newpdf') def nextxy(self): self._do('nextxy') def readpdf(self, pdfstring): self._read_file_command(command='readpdf', string=pdfstring) def setinfo(self, key, text): self._set_variable('key', key) self._set_variable('text', text) self._do('setinfo') def writepdf(self): return self._write_file_command(command='writepdf')
participants (1)
-
Kyler B. Laird