Paru—Pandoc wrapped around in Ruby

Contents

Introduction

Paru is a simple Ruby wrapper around pandoc, the great multi-format document converter. Paru supports automating pandoc by writing Ruby programs and using pandoc in your Ruby programs (see Chapter 2 in the manual). Paru also supports writing pandoc filters in Ruby (see Chapter 3 in the manual). In paru’s manual the use of paru is explained in detail, from explaining how to install and use paru, creating and using filters, to putting it all together in a real-world use case: generating the manual!

See also the paru API documentation.

Note If you’re using pandoc 3, use paru version 1.1.x or higher; paru 1.0.x doesn’t work with pandoc 3. If you’re still using pandoc version 2, use paru version 1.0.x instead.

This README is a brief overview of paru’s features and usages.

Licence

Paru is free sofware; paru is released under the GPLv3. You find paru’s source code on github.

Acknowledgements

I would like to thank the following users for their contributions of patches, bug reports, fixes, and suggestions. With your help paru is growing beyond a simple tool for personal use into a useful addition to the pandoc ecosystem.

Installation

Paru is installed through rubygems as follows:

gem install paru

You can also download the latest gem paru-1.2.0.gem and install it by:

cd /directory/you/downloaded/the/gem/to
gem install paru-1.2.0.gem

Paru, obviously, requires pandoc. See pandoc.org/installing.html about how to install pandoc on your system and pandoc’s manual on how to use pandoc.

You can generate the API documentation for paru by cloning the repository and running rake yard. It’ll put it in documentation/api-doc.

Paru says hello to pandoc

Using paru is straightforward. It is a thin “rubyesque” layer around the pandoc executable. After requiring paru in your ruby program, you create a new paru pandoc converter as follows:

require "paru/pandoc"

converter = Paru::Pandoc.new

The various command-line options of pandoc map to methods on this newly created instance. When you want to use a pandoc command-line option that contains dashes, replace all dashes with an underscore to get the corresponding paru method. For example, the pandoc command-line option --pdf-engine becomes the paru method pdf_engine. Knowing this convention, you can convert from markdown to pdf using the lualatex engine by calling the from, to, and pdf_engine methods to configure the converter. There is a convenience configure method that takes a block to configure multiple options at once:

require "paru/pandoc"

converter = Paru::Pandoc.new
converter.configure do
    from "markdown"
    to "latex"
    pdf_engine "lualatex"
    output "my_first_pdf_file.pdf"
end

As creating and immediately configuring a converter is a common pattern, the constructor takes a configuration block as well. Finally, when you have configured the converter, you can use it to convert a string with the convert method, which is aliased by The << operator. You can call convert multiple times and re-configure the converter in between.

This introductory section is ended by the obligatory “hello world” program, paru-style:

#!/usr/bin/env ruby
require "paru/pandoc"

input = "Hello world, from **pandoc**"

output = Paru::Pandoc.new do
    from "markdown"
    to "html"
end << input

puts output

Running the above program results in the following output:

<p>Hello world, from <strong>pandoc</strong></p>

To support converting files that cannot easily be represented by a single string, such as EPUB or docx, paru also has the convert_file method. It takes a path as argument, and when executed, it tells pandoc to convert that path using the current configured pandoc configuration.

In the next chapter, the development of do-pandoc.rb is presented as an example of real-world usage of paru.

Writing and using pandoc filters with paru

One of pandoc’s interesting capabilities are custom filters. This is an extremely powerful feature that allows you to automate certain tasks, such as numbering figures, using other command-line programs to pre or post process parts of the input, or change the structure of the input document before having pandoc writing it out. Paru allows you to write pandoc filters in Ruby.

For a collection of paru filters, have a look at the paru-filter-collection.

The simplest paru pandoc filter is the identity filter that does do nothing:

#!/usr/bin/env ruby
# Identity filter
require "paru/filter"

Paru::Filter.run do
    # nothing
end

Nevertheless, it shows the structure of every paru pandoc filter: A filter is an executable script (line 1), it uses the paru/filter module, and it executes a Paru::Filter object. Running the identity filter is a good way to start writing your own filters. In the next sections several simple but useful filters are developed to showcase the use of paru to write pandoc filters in Ruby.

A more useful filter is to numbering figures. In some output formats, such as PDF, HTML + CSS, or ODT, figures can be automatically numbered. In other formats, notably markdown itself, numbering has to be done manually. However, it is very easy to create a filter that does this numbering of figures automatically as well:

#!/usr/bin/env ruby
# Number all figures in a document and prefix the caption with "Figure".
require "paru/filter"

figure_counter = 0;

Paru::Filter.run do 
    with "Image" do |image|
        figure_counter += 1
        image.inner_markdown = "Figure #{figure_counter}. #{image.inner_markdown}"
    end
end

The filter number_figures.rb keeps track of the last figure’s sequence number in counter. Each time an Image is encountered while processing the input file, that counter is incremented and the image’s caption is prefixed with “Figure #counter.” by overwriting the image’s node’s inner markdown.

For more information about writing filters, please see paru’s manual or the API documentation for the Filter class. Furthermore, example filters can also be found in the filters sub directory of paru’s examples. Feel free to copy and adapt them to your needs.

Documentation

Manual

For more information on automatic the use of pandoc with paru or writing pandoc filters in ruby, please see paru’s manual.

API documentation

The API documentation covers the whole of paru. Where the manual just describes a couple of scenarios, the API documentation shows all available functionality. It also give more examples of using paru and writing filters.

Frequently asked questions

Feel free to ask me a question: send me an email or submit a new issue if you’ve found a bug!

  • I get an error like “Erro: JSON parse error: Error in $: Incompatible API versions: encoded with [1,20] but attempted to decode with [1,21].”

The versions of pandoc and paru you are using are incompatible. Please install the latest versions of pandoc and paru.

Why does this happen? Internally pandoc uses pandoc-types to represent documents its converts and filters. Documents represented by one version of pandoc-types are slightly incompatible with documents represented by another version of pandoc-types. This also means that filters written in paru for one version of pandoc-types are not guaranteed to work on documents represented by another version of pandoc-types. As a result, not all paru versions work together with all pandoc versions.

As a general rule: Use the latest versions of pandoc and paru.

  • I get an error like “‘values_at’: no implicit conversion of String into Integer (TypeError) from lib/paru/filter/document.rb:54:in ‘from_JSON’”

The most likely cause is that you’re using an old version of Pandoc. Paru version 0.2.x only supports pandoc version 1.18 and up. In pandoc version 1.18 there was a breaking API change in the way filters worked. Please upgrade your pandoc installation.