20 Dec 2012

Building a VM to convert documents to PDF


To have a machine which I can throw ODT files at and get back PDF documents.

(It will work with all OpenOffice supported document types: Word, RTF, PPT, …).

Quick start

Following these steps you should get the conversion working on a clean machine. I’ve used Ubuntu 12.10 minimal server install.

    sudo apt-get install openoffice.org
    sudo apt-get install unoconv
    unoconv --listener &
    unoconv -f pdf test.odt

Once we get the converter working, let’s wrap this functionality in a rack application using Sinatra.

You can follow this guide, to install RVM, Ruby, passenger and nginx.

Minimal Sinatra app to convert the files to PDF

The following code expects you to POST a file. It saves it in a temp folder, converts it using unoconv and sends you the resulting PDF file.

    require 'sinatra'

    TMP_PATH = 'tmp/'

    post '/' do
     source_file_contents = params[:file][:tempfile].read
     source_filename = get_source_filename(params[:file][:filename])
     output_filename = source_filename + '.pdf'

     File.open(source_filename, 'w') { |f| f.write(source_file_contents) }

     %x[unoconv -f pdf #{source_filename}]

    def get_source_filename(base)
     filename = params[:file][:filename].split('.')[0]
     filename = filename + '_' + Time.now.nsec.to_s
     File.join(TMP_PATH, filename)

You will also need a config file for the rack application:

    require 'sinatra'
    require 'converter'

    run Sinatra::Application

Now we just need to configure nginx to serve our sinatra app.

      server {
        listen              80;
        server_name         localhost;
        passenger_enabled   on;
        root                /home/jose/converter/public;

This is the project folder structure:

|-- config.ru
|-- converter.rb
|-- public
`-- tmp

Testing from the command line

Now from other computer you can test the conversor using:

curl -X POST -F file=@test.odt > tmp.pdf

Change the parameters to suit your environment, but that should convert test.odt file to PDF and put the results on tmp.pdf.

What’s next

This is just a proof of concept. The Sinatra app is doing the absolute minimun to get the conversion. It’s not cleaning temporal files for example.

Also the VM is not configured to autolaunch the unoconv listener, nor monitor its status, etc.