Class: Paru::Pandoc

Inherits:
Object
  • Object
show all
Defined in:
lib/paru/pandoc.rb

Overview

Pandoc is a wrapper around the pandoc document converter. See <pandoc.org/README.html> for details about pandoc. The Pandoc class is basically a straightforward translation from the pandoc command line program to Ruby. It is a Rubyesque API to work with pandoc.

For information about writing pandoc filters in Ruby see Filter.

Creating a Paru pandoc converter in Ruby is quite straightforward: you create a new Paru::Pandoc object with a block that configures that Pandoc object with pandoc options. Each command-line option to pandoc is a method on the Pandoc object. Command-line options with dashes in them, such as “–reference-docx”, can be called by replacing the dash with an underscore. So, “–reference-docx” becomes the method reference_docx.

Pandoc command-line flags, such as “–parse-raw”, “–chapters”, or “–toc”, have been translated to Paru::Pandoc methods that take an optional Boolean parameter; true is the default value. Therefore, if you want to enable a flag, no parameter is needed.

All other pandoc command-line options are translated to Paru::Pandoc methods that take either one String or Number argument, or a list of String arguments if that command-line option can occur more than once (such as “–include-before-header” or “–filter”).

Once you have configured a Paru::Pandoc converter, you can call convert or << (which is an alias for convert) with a string to convert. You can call convert as often as you like and, if you like, reconfigure the converter in between!

Examples:

Convert the markdown string ‘hello world’ to HTML

Paru::Pandoc.new do
    from 'markdown
    to 'html'
end << 'hello *world*'

Convert a HTML file to DOCX with a reference file

Paru::Pandoc.new do
    from "html"
    to "docx"
    reference_docx "styled_output.docx"
    output "output.docx"
end.convert File.read("input.html")

Convert a markdown file to html but add in references in APA style

Paru::Pandoc.new do
    from "markdown"
    toc
    bibliography "literature.bib"
    to "html"
    csl "apa.csl"
    output "report_with_references.md"
end << File.read("report.md")

Constant Summary collapse

DEFAULT_OPTION_SEP =

Use a readable option separator on Unix-like systems, but fall back to a space on Windows.

Gem.win_platform? ? ' ' : " \\\n\t"
PARU_PANDOC_PATH =

Path to the pandoc executatble to use by paru.

'PARU_PANDOC_PATH'.freeze

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&block) ⇒ Pandoc

Create a new Pandoc converter, optionally configured by a block with pandoc options. See #configure on how to configure a converter.

Parameters:

  • block (Proc)

    an optional configuration block.



107
108
109
110
# File 'lib/paru/pandoc.rb', line 107

def initialize(&block)
  @options = {}
  configure(&block) if block_given?
end

Class Method Details

.infoInfo

Gather information about the pandoc installation. It runs pandoc –version and extracts pandoc’s version number and default data directory. This method is typically used in scripts that use Paru to automate the use of pandoc.

Returns:

  • (Info)

    Pandoc’s version, such as “[2.10.1]” and the data directory, such as “/home/huub/.pandoc”.



99
100
101
# File 'lib/paru/pandoc.rb', line 99

def self.info
  @@info
end

Instance Method Details

#configure(&block) ⇒ Pandoc

Configure this Pandoc converter with block. In the block you can call all pandoc options as methods on this converter. In multi-word options the dash (-) is replaced by an underscore (_)

Pandoc has a number of command line options. Most are simple options, like flags, that can be set only once. Other options can occur more than once, such as the css option: to add more than one css file to a generated standalone html file, use the css options once for each stylesheet to include. Other options do have the pattern key, which can also occur multiple times, such as metadata.

All options are specified in a pandoc_options.yaml. If it is an option that can occur only once, the value of the option in that yaml file is its default value. If the option can occur multiple times, its value is an array with one value, the default value.

Examples:

Configure converting HTML to LaTeX with a LaTeX engine

converter.configure do
    from 'html'
    to 'latex'
    latex_engine 'lualatex'
end

Parameters:

  • block (Proc)

    the options to pandoc

Returns:

  • (Pandoc)

    this Pandoc converter



138
139
140
141
# File 'lib/paru/pandoc.rb', line 138

def configure(&block)
  instance_eval(&block)
  self
end

#convert(input) ⇒ String Also known as: <<

Converts input string to output string using the pandoc invocation configured in this Pandoc instance.

formats, output to STDOUT is not supported (see pandoc’s manual) and the result string will be empty.

The following two examples are the same:

Examples:

Using convert

output = converter.convert 'this is a *strong* word'

Using <<

output = converter << 'this is a *strong* word'

Parameters:

  • input (String)

    the input string to convert

Returns:

  • (String)

    the converted output as a string. Note. For some



158
159
160
# File 'lib/paru/pandoc.rb', line 158

def convert(input)
  run_converter to_command, input
end

#convert_file(input_file) ⇒ String

Converts an input file to output string using the pandoc invocation configured in this Pandoc instance. The path to the input file is appended to that invocation.

formats, output to STDOUT is not supported (see pandoc’s manual) and the result string will be empty.

Examples:

Using convert_file

output = converter.convert_file 'files/document.md'

Parameters:

  • input_file (String)

    the path to the input file to convert

Returns:

  • (String)

    the converted output as a string. Note. For some



174
175
176
# File 'lib/paru/pandoc.rb', line 174

def convert_file(input_file)
  run_converter "#{to_command} #{input_file}"
end

#to_command(option_sep = DEFAULT_OPTION_SEP) ⇒ String

Create a string representation of this converter’s pandoc command line invocation. This is useful for debugging purposes.

Parameters:

  • option_sep (String) (defaults to: DEFAULT_OPTION_SEP)

    the string to separate options with

Returns:

  • (String)

    This converter’s command line invocation string.



183
184
185
# File 'lib/paru/pandoc.rb', line 183

def to_command(option_sep = DEFAULT_OPTION_SEP)
  "#{escape(@@pandoc_exec)}\t#{to_option_string option_sep}"
end