Class: Paru::Filter

Inherits:
Object
  • Object
show all
Defined in:
lib/paru/filter.rb

Overview

Filter is used to write your own pandoc filter in Ruby. A Filter is almost always created and immediately executed via the run method. The most simple filter you can write in paru is the so-called “identity”:

#!/usr/bin/env ruby
# Identity filter
require 'paru/filter'

Paru::Filter.run do
  # nothing
end

It runs the filter, but it makes no selection nor performs an action. This is pretty useless, of course—although it makes for a great way to test the filter functionality—, but it shows the general setup of a filter well.

Writing a simple filter: numbering figures

Inside a Filter.run block, you specify selectors with actions. For example, to number all figures in a document and prefix their captions with “Figure”, the following filter would work:

#!/usr/bin/env ruby
# Number all figures in a document and prefix the caption with "Figure".
require 'paru/filter'

figure_counter = 0

Paru::Filter.run do
  with 'Image' do |image|
    figure_counter += 1
    image.inner_markdown = "Figure #{figure_counter}. #{image.inner_markdown}"
  end
end

This filter selects all PandocFilter::Image nodes. For each PandocFilter::Image node it increments the figure counter figure_counter and then sets the figure’s caption to “Figure” followed by the figure count and the original caption. In other words, the following input document

![My first image](img/horse.png)

![My second image](img/rabbit.jpeg)

will be transformed into

![Figure 1. My first image](img/horse.png)

![Figure 2. My second image](img/rabbit.jpeg)

The method PandocFilter::InnerMarkdown#inner_markdown and its counterpart PandocFilter::Node#markdown are a great way to manipulate the contents of a selected PandocFilter::Node. No messing about creating and filling PandocFilter::Nodes, you can just use pandoc’s own markdown format!

Writing a more involved filters

Using the “follows” selector: Numbering figures and chapters

The previous example can be extended to also number chapters and to start numbering figures anew per chapter. As you would expect, we need two counters, one for the figures and one for the chapters:

#!/usr/bin/env ruby
# Number figures per chapter; number chapters as well
require 'paru/filter'

current_chapter = 0
current_figure = 0

Paru::Filter.run do
  with 'Header' do |header|
    if header.level == 1
      current_chapter += 1
      current_figure = 0

      header.inner_markdown = "Chapter #{current_chapter}. #{header.inner_markdown}"
    end
  end

  with 'Header + Image' do |image|
    current_figure += 1
    image.inner_markdown = "Figure #{current_chapter}.#{current_figure}. #{image.inner_markdown}"
  end
end

What is new in this filter, however, is the selector “Header + Image” which selects all PandocFilter::Image nodes that follow a PandocFilter::Header node. Documents in pandoc have a flat structure where chapters do not exists as separate concepts. Instead, a chapter is implied by a header of a certain level and everything that follows until the next header of that level.

Using the “child of” selector: Annotate custom blocks

Hierarchical structures do exist in a pandoc document, however. For example, the contents of a paragraph (PandocFilter::Para), which itself is a PandocFilter::Block level node, are PandocFilter::Inline level nodes. Another example are custom block or PandocFilter::Div nodes. You select a child node by using the > selector as in the example below:

#!/usr/bin/env ruby
# Annotate custom blocks: example blocks and important blocks
require 'paru/filter'

example_count = 0

Paru::Filter.run do
  with 'Div.example > Header' do |header|
    if header.level == 3
      example_count += 1
      header.inner_markdown = "Example #{example_count}: #{header.inner_markdown}"
    end
  end

  with 'Div.important' do |d|
    d.inner_markdown = d.inner_markdown + "\n\n*(important)*"
  end
end

Here all PandocFilter::Header nodes that are inside a PandocFilter::Div node are selected. Furthermore, if these headers are of level 3, they are prefixed by the string “Example” followed by a count.

In this example, “important” PandocFilter::Div nodes are annotated by putting the string important before the contents of the node.

Using a distance in a selector: Capitalize the first N characters of

a paragraph

Given the flat structure of a pandoc document, the “follows” selector has quite a reach. For example, “Header + Para” selects all paragraphs that follow a header. In most well-structured documents, this would select basically all paragraphs.

But what if you need to be more specific? For example, if you would like to capitalize the first sentence of each first paragraph of a chapter, you need a way to specify a sequence number of sorts. To that end, paru filter selectors take an optional distance parameter. A filter for this example could look like:

#!/usr/bin/env ruby
# Capitalize the first N characters of a paragraph
require 'paru/filter'

END_CAPITAL = 10
Paru::Filter.run do
  with 'Header +1 Para' do |p|
    text = p.inner_markdown
    first_line = text.slice(0, END_CAPITAL).upcase
    rest = text.slice(END_CAPITAL, text.size)
    p.inner_markdown = first_line + rest
  end
end

The distance is denoted after a selector by an integer. In this case “Header +1 Para” selects all PandocFilter::Para nodes that directly follow an PandocFilter::Header node. You can use a distance with any selector.

Manipulating nodes: Removing horizontal lines

Although the PandocFilter::InnerMarkdown#inner_markdown and PandocFilter::Node#markdown work in most situations, sometimes direct manipulation of the pandoc document AST is useful. These PandocFilter::ASTManipulation methods are mixed in PandocFilter::Node and can be used on any node in your filter. For example, to delete all PandocFilter::HorizontalRule nodes, can use a filter like:

#!/usr/bin/env ruby
require 'paru/filter'

Paru::Filter.run do
  with 'HorizontalRule' do |rule|
    rule.parent.delete rule if rule.has_parent?
  end
end

Note that you could have arrived at the same effect by using:

rule.markdown = ""

Manipulating metadata:

One of the interesting features of the pandoc markdown format is the ability to add metadata to a document via a YAML block or command line options. For example, if you use a template that uses the metadata property $date$ to write a date on a title page, it is quite useful to automatically add the date of today to the metadata. You can do so with a filter like:

#!/usr/bin/env ruby
## Add today's date to the metadata
require 'paru/filter'
require 'date'

Paru::Filter.run do
  before do
    ['date'] = Date.today.to_s
  end
end

In a filter, the metadata property is a Ruby Hash of Strings, Numbers, Booleans, Arrays, and Hashes. You can manipulate it like any other Ruby Hash.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input = $stdin, output = $stdout, treat_metadata_strings_as_plain_strings: false) ⇒ Filter

Create a new Filter instance. For convenience, run creates a new Paru::Filter and runs it immediately. Use this constructor if you want to run a filter on different input and output streams that STDIN and STDOUT respectively.

toggle to treat metadata string values as plain strings instead of markdown strings if all AST leaf metadata string values have pandoc type “MetaString”. This option is only relevant when you only set metadata string values via command-line option ‘–metadata` and not also via a YAML or title block. Using this option improves performance in this specific situation because metadata values don’t have to be converted to string by pandoc in a separate process but can be collected as is.

Parameters:

  • input (IO = $stdin) (defaults to: $stdin)

    the input stream to read, defaults to STDIN

  • output (IO = $stdout) (defaults to: $stdout)

    the output stream to write, defaults to STDOUT

  • treat_metadata_strings_as_plain_strings (Boolean = false) (defaults to: false)

    feature



242
243
244
245
246
# File 'lib/paru/filter.rb', line 242

def initialize(input = $stdin, output = $stdout, treat_metadata_strings_as_plain_strings: false)
  @input = input
  @output = output
  @treat_metadata_strings_as_plain_strings = 
end

Instance Attribute Details

#current_nodeNode

Returns The node in the AST of the document being filtered that is currently being inspected by the filter.

Returns:

  • (Node)

    The node in the AST of the document being filtered that is currently being inspected by the filter.



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# File 'lib/paru/filter.rb', line 222

class Filter
  attr_reader :metadata, :document, :current_node

  # Create a new Filter instance. For convenience, {run} creates a new
  # {Filter} and runs it immediately. Use this constructor if you want
  # to run a filter on different input and output streams that STDIN and
  # STDOUT respectively.
  #
  # @param input [IO = $stdin] the input stream to read, defaults to
  #   STDIN
  # @param output [IO = $stdout] the output stream to write, defaults to
  #   STDOUT
  # @param treat_metadata_strings_as_plain_strings [Boolean = false] feature
  # toggle to treat metadata string values as plain strings instead of
  # markdown strings if all AST leaf metadata string values have pandoc type
  # "MetaString". This option is only relevant when you **only** set metadata
  # string values via command-line option `--metadata` and not also via a
  # YAML or title block. Using this option improves performance in this
  # specific situation because metadata values don't have to be converted to
  # string by pandoc in a separate process but can be collected as is.
  def initialize(input = $stdin, output = $stdout, treat_metadata_strings_as_plain_strings: false)
    @input = input
    @output = output
    @treat_metadata_strings_as_plain_strings = 
  end

  # Run the filter specified by block. This is a convenience method that
  # creates a new {Filter} using input stream STDIN and output stream
  # STDOUT and immediately runs {filter} with the block supplied.
  #
  # @param treat_metadata_strings_as_plain_strings [Boolean = false] feature
  # toggle to treat metadata string values as plain strings instead of
  # markdown strings if all AST leaf metadata string values have pandoc type
  # "MetaString". This option is only relevant when you **only** set metadata
  # string values via command-line option `--metadata` and not also via a
  # YAML or title block. Using this option improves performance in this
  # specific situation because metadata values don't have to be converted to
  # string by pandoc in a separate process but can be collected as is.
  # @param block [Proc] the filter specification
  #
  # @example Add 'Figure' to each image's caption
  #   Paru::Filter.run do
  #       with "Image" do |image|
  #           image.inner_markdown = "Figure. #{image.inner_markdown}"
  #       end
  #   end
  def self.run(treat_metadata_strings_as_plain_strings: false, &block)
    Filter.new(
      $stdin,
      $stdout,
      treat_metadata_strings_as_plain_strings: 
    ).filter(&block)
  end

  # Create a filter using +block+. In the block you specify
  # selectors and actions to be performed on selected nodes. In the
  # example below, the selector is "Image", which selects all image
  # nodes. The action is to prepend the contents of the image's caption
  # by the string "Figure. ".
  #
  # @param block [Proc] the filter specification
  #
  # @return [JSON] a JSON string with the filtered pandoc AST
  #
  # @example Add 'Figure' to each image's caption
  #   input = IOString.new(File.read("my_report.md")
  #   output = IOString.new
  #
  #   Paru::Filter.new(input, output).filter do
  #       with "Image" do |image|
  #           image.inner_markdown = "Figure. #{image.inner_markdown}"
  #       end
  #   end
  #
  def filter(&block)
    @selectors = {}
    @filtered_nodes = []
    @document = read_document

    @metadata = PandocFilter::Metadata.new(
      @document.meta,
      treat_metadata_strings_as_plain_strings: @treat_metadata_strings_as_plain_strings
    )

    nodes_to_filter = Enumerator.new do |node_list|
      @document.each_depth_first do |node|
        node_list << node
      end
    end

    @current_node = @document

    @ran_before = false
    @ran_after = false
    instance_eval(&block) # run filter with before block
    @ran_before = true

    nodes_to_filter.each do |node|
      if @current_node.has_been_replaced?
        @current_node = @current_node.get_replacement
        @filtered_nodes.pop
      else
        @current_node = node
      end

      @filtered_nodes.push @current_node

      instance_eval(&block) # run the actual filter code
    end

    @ran_after = true
    instance_eval(&block) # run filter with after block

    write_document
  end

  # Specify what nodes to filter with a +selector+. If the +current_node+
  # matches that selector, it is passed to the block to this +with+ method.
  #
  # @param selector [String] a selector string
  # @yield [Node] the current node if it matches the selector
  def with(selector)
    return unless @ran_before && !@ran_after

    @selectors[selector] = Selector.new selector unless @selectors.key? selector
    yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes
  end

  # Before running the filter on all nodes, the +document+ is passed to
  # the block to this +before+ method. This method is run exactly once.
  #
  # @yield [Document] the document
  def before
    yield @document unless @ran_before
  end

  # After running the filter on all nodes, the +document+ is passed to
  # the block to this +after+ method. This method is run exactly once.
  #
  # @yield [Document] the document
  def after
    yield @document if @ran_after
  end

  # Stop processing the document any further and output it as it is now.
  # This is a great timesaver for filters that only act on a small
  # number of nodes in a large document, or when you only want to set
  # the metadata.
  #
  # Note, stop will break off the filter immediately after outputting
  # the document in its current state.
  def stop!
    write_document
    exit
  end

  private

  # The Document node from JSON formatted pandoc document structure
  # on STDIN that is being filtered
  #
  # @return [Document] create a new Document node from a pandoc AST from
  #   JSON from STDIN
  def read_document
    PandocFilter::Document.from_JSON @input.read
  end

  # Write the document being filtered to STDOUT
  def write_document
    @document.meta = @metadata.to_meta
    @output.write @document.to_JSON
  end
end

#documentDocument

Returns The document being filtered.

Returns:

  • (Document)

    The document being filtered



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# File 'lib/paru/filter.rb', line 222

class Filter
  attr_reader :metadata, :document, :current_node

  # Create a new Filter instance. For convenience, {run} creates a new
  # {Filter} and runs it immediately. Use this constructor if you want
  # to run a filter on different input and output streams that STDIN and
  # STDOUT respectively.
  #
  # @param input [IO = $stdin] the input stream to read, defaults to
  #   STDIN
  # @param output [IO = $stdout] the output stream to write, defaults to
  #   STDOUT
  # @param treat_metadata_strings_as_plain_strings [Boolean = false] feature
  # toggle to treat metadata string values as plain strings instead of
  # markdown strings if all AST leaf metadata string values have pandoc type
  # "MetaString". This option is only relevant when you **only** set metadata
  # string values via command-line option `--metadata` and not also via a
  # YAML or title block. Using this option improves performance in this
  # specific situation because metadata values don't have to be converted to
  # string by pandoc in a separate process but can be collected as is.
  def initialize(input = $stdin, output = $stdout, treat_metadata_strings_as_plain_strings: false)
    @input = input
    @output = output
    @treat_metadata_strings_as_plain_strings = 
  end

  # Run the filter specified by block. This is a convenience method that
  # creates a new {Filter} using input stream STDIN and output stream
  # STDOUT and immediately runs {filter} with the block supplied.
  #
  # @param treat_metadata_strings_as_plain_strings [Boolean = false] feature
  # toggle to treat metadata string values as plain strings instead of
  # markdown strings if all AST leaf metadata string values have pandoc type
  # "MetaString". This option is only relevant when you **only** set metadata
  # string values via command-line option `--metadata` and not also via a
  # YAML or title block. Using this option improves performance in this
  # specific situation because metadata values don't have to be converted to
  # string by pandoc in a separate process but can be collected as is.
  # @param block [Proc] the filter specification
  #
  # @example Add 'Figure' to each image's caption
  #   Paru::Filter.run do
  #       with "Image" do |image|
  #           image.inner_markdown = "Figure. #{image.inner_markdown}"
  #       end
  #   end
  def self.run(treat_metadata_strings_as_plain_strings: false, &block)
    Filter.new(
      $stdin,
      $stdout,
      treat_metadata_strings_as_plain_strings: 
    ).filter(&block)
  end

  # Create a filter using +block+. In the block you specify
  # selectors and actions to be performed on selected nodes. In the
  # example below, the selector is "Image", which selects all image
  # nodes. The action is to prepend the contents of the image's caption
  # by the string "Figure. ".
  #
  # @param block [Proc] the filter specification
  #
  # @return [JSON] a JSON string with the filtered pandoc AST
  #
  # @example Add 'Figure' to each image's caption
  #   input = IOString.new(File.read("my_report.md")
  #   output = IOString.new
  #
  #   Paru::Filter.new(input, output).filter do
  #       with "Image" do |image|
  #           image.inner_markdown = "Figure. #{image.inner_markdown}"
  #       end
  #   end
  #
  def filter(&block)
    @selectors = {}
    @filtered_nodes = []
    @document = read_document

    @metadata = PandocFilter::Metadata.new(
      @document.meta,
      treat_metadata_strings_as_plain_strings: @treat_metadata_strings_as_plain_strings
    )

    nodes_to_filter = Enumerator.new do |node_list|
      @document.each_depth_first do |node|
        node_list << node
      end
    end

    @current_node = @document

    @ran_before = false
    @ran_after = false
    instance_eval(&block) # run filter with before block
    @ran_before = true

    nodes_to_filter.each do |node|
      if @current_node.has_been_replaced?
        @current_node = @current_node.get_replacement
        @filtered_nodes.pop
      else
        @current_node = node
      end

      @filtered_nodes.push @current_node

      instance_eval(&block) # run the actual filter code
    end

    @ran_after = true
    instance_eval(&block) # run filter with after block

    write_document
  end

  # Specify what nodes to filter with a +selector+. If the +current_node+
  # matches that selector, it is passed to the block to this +with+ method.
  #
  # @param selector [String] a selector string
  # @yield [Node] the current node if it matches the selector
  def with(selector)
    return unless @ran_before && !@ran_after

    @selectors[selector] = Selector.new selector unless @selectors.key? selector
    yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes
  end

  # Before running the filter on all nodes, the +document+ is passed to
  # the block to this +before+ method. This method is run exactly once.
  #
  # @yield [Document] the document
  def before
    yield @document unless @ran_before
  end

  # After running the filter on all nodes, the +document+ is passed to
  # the block to this +after+ method. This method is run exactly once.
  #
  # @yield [Document] the document
  def after
    yield @document if @ran_after
  end

  # Stop processing the document any further and output it as it is now.
  # This is a great timesaver for filters that only act on a small
  # number of nodes in a large document, or when you only want to set
  # the metadata.
  #
  # Note, stop will break off the filter immediately after outputting
  # the document in its current state.
  def stop!
    write_document
    exit
  end

  private

  # The Document node from JSON formatted pandoc document structure
  # on STDIN that is being filtered
  #
  # @return [Document] create a new Document node from a pandoc AST from
  #   JSON from STDIN
  def read_document
    PandocFilter::Document.from_JSON @input.read
  end

  # Write the document being filtered to STDOUT
  def write_document
    @document.meta = @metadata.to_meta
    @output.write @document.to_JSON
  end
end

#metadataHash

Returns The metadata of the document being filtered as a Ruby Hash.

Returns:

  • (Hash)

    The metadata of the document being filtered as a Ruby Hash



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# File 'lib/paru/filter.rb', line 222

class Filter
  attr_reader :metadata, :document, :current_node

  # Create a new Filter instance. For convenience, {run} creates a new
  # {Filter} and runs it immediately. Use this constructor if you want
  # to run a filter on different input and output streams that STDIN and
  # STDOUT respectively.
  #
  # @param input [IO = $stdin] the input stream to read, defaults to
  #   STDIN
  # @param output [IO = $stdout] the output stream to write, defaults to
  #   STDOUT
  # @param treat_metadata_strings_as_plain_strings [Boolean = false] feature
  # toggle to treat metadata string values as plain strings instead of
  # markdown strings if all AST leaf metadata string values have pandoc type
  # "MetaString". This option is only relevant when you **only** set metadata
  # string values via command-line option `--metadata` and not also via a
  # YAML or title block. Using this option improves performance in this
  # specific situation because metadata values don't have to be converted to
  # string by pandoc in a separate process but can be collected as is.
  def initialize(input = $stdin, output = $stdout, treat_metadata_strings_as_plain_strings: false)
    @input = input
    @output = output
    @treat_metadata_strings_as_plain_strings = 
  end

  # Run the filter specified by block. This is a convenience method that
  # creates a new {Filter} using input stream STDIN and output stream
  # STDOUT and immediately runs {filter} with the block supplied.
  #
  # @param treat_metadata_strings_as_plain_strings [Boolean = false] feature
  # toggle to treat metadata string values as plain strings instead of
  # markdown strings if all AST leaf metadata string values have pandoc type
  # "MetaString". This option is only relevant when you **only** set metadata
  # string values via command-line option `--metadata` and not also via a
  # YAML or title block. Using this option improves performance in this
  # specific situation because metadata values don't have to be converted to
  # string by pandoc in a separate process but can be collected as is.
  # @param block [Proc] the filter specification
  #
  # @example Add 'Figure' to each image's caption
  #   Paru::Filter.run do
  #       with "Image" do |image|
  #           image.inner_markdown = "Figure. #{image.inner_markdown}"
  #       end
  #   end
  def self.run(treat_metadata_strings_as_plain_strings: false, &block)
    Filter.new(
      $stdin,
      $stdout,
      treat_metadata_strings_as_plain_strings: 
    ).filter(&block)
  end

  # Create a filter using +block+. In the block you specify
  # selectors and actions to be performed on selected nodes. In the
  # example below, the selector is "Image", which selects all image
  # nodes. The action is to prepend the contents of the image's caption
  # by the string "Figure. ".
  #
  # @param block [Proc] the filter specification
  #
  # @return [JSON] a JSON string with the filtered pandoc AST
  #
  # @example Add 'Figure' to each image's caption
  #   input = IOString.new(File.read("my_report.md")
  #   output = IOString.new
  #
  #   Paru::Filter.new(input, output).filter do
  #       with "Image" do |image|
  #           image.inner_markdown = "Figure. #{image.inner_markdown}"
  #       end
  #   end
  #
  def filter(&block)
    @selectors = {}
    @filtered_nodes = []
    @document = read_document

    @metadata = PandocFilter::Metadata.new(
      @document.meta,
      treat_metadata_strings_as_plain_strings: @treat_metadata_strings_as_plain_strings
    )

    nodes_to_filter = Enumerator.new do |node_list|
      @document.each_depth_first do |node|
        node_list << node
      end
    end

    @current_node = @document

    @ran_before = false
    @ran_after = false
    instance_eval(&block) # run filter with before block
    @ran_before = true

    nodes_to_filter.each do |node|
      if @current_node.has_been_replaced?
        @current_node = @current_node.get_replacement
        @filtered_nodes.pop
      else
        @current_node = node
      end

      @filtered_nodes.push @current_node

      instance_eval(&block) # run the actual filter code
    end

    @ran_after = true
    instance_eval(&block) # run filter with after block

    write_document
  end

  # Specify what nodes to filter with a +selector+. If the +current_node+
  # matches that selector, it is passed to the block to this +with+ method.
  #
  # @param selector [String] a selector string
  # @yield [Node] the current node if it matches the selector
  def with(selector)
    return unless @ran_before && !@ran_after

    @selectors[selector] = Selector.new selector unless @selectors.key? selector
    yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes
  end

  # Before running the filter on all nodes, the +document+ is passed to
  # the block to this +before+ method. This method is run exactly once.
  #
  # @yield [Document] the document
  def before
    yield @document unless @ran_before
  end

  # After running the filter on all nodes, the +document+ is passed to
  # the block to this +after+ method. This method is run exactly once.
  #
  # @yield [Document] the document
  def after
    yield @document if @ran_after
  end

  # Stop processing the document any further and output it as it is now.
  # This is a great timesaver for filters that only act on a small
  # number of nodes in a large document, or when you only want to set
  # the metadata.
  #
  # Note, stop will break off the filter immediately after outputting
  # the document in its current state.
  def stop!
    write_document
    exit
  end

  private

  # The Document node from JSON formatted pandoc document structure
  # on STDIN that is being filtered
  #
  # @return [Document] create a new Document node from a pandoc AST from
  #   JSON from STDIN
  def read_document
    PandocFilter::Document.from_JSON @input.read
  end

  # Write the document being filtered to STDOUT
  def write_document
    @document.meta = @metadata.to_meta
    @output.write @document.to_JSON
  end
end

Class Method Details

.run(treat_metadata_strings_as_plain_strings: false, &block) ⇒ Object

Run the filter specified by block. This is a convenience method that creates a new Paru::Filter using input stream STDIN and output stream STDOUT and immediately runs #filter with the block supplied.

toggle to treat metadata string values as plain strings instead of markdown strings if all AST leaf metadata string values have pandoc type “MetaString”. This option is only relevant when you only set metadata string values via command-line option ‘–metadata` and not also via a YAML or title block. Using this option improves performance in this specific situation because metadata values don’t have to be converted to string by pandoc in a separate process but can be collected as is.

Examples:

Add ‘Figure’ to each image’s caption

Paru::Filter.run do
    with "Image" do |image|
        image.inner_markdown = "Figure. #{image.inner_markdown}"
    end
end

Parameters:

  • treat_metadata_strings_as_plain_strings (Boolean = false) (defaults to: false)

    feature

  • block (Proc)

    the filter specification



268
269
270
271
272
273
274
# File 'lib/paru/filter.rb', line 268

def self.run(treat_metadata_strings_as_plain_strings: false, &block)
  Filter.new(
    $stdin,
    $stdout,
    treat_metadata_strings_as_plain_strings: 
  ).filter(&block)
end

Instance Method Details

#after {|Document| ... } ⇒ Object

After running the filter on all nodes, the document is passed to the block to this after method. This method is run exactly once.

Yields:

  • (Document)

    the document



362
363
364
# File 'lib/paru/filter.rb', line 362

def after
  yield @document if @ran_after
end

#before {|Document| ... } ⇒ Object

Before running the filter on all nodes, the document is passed to the block to this before method. This method is run exactly once.

Yields:

  • (Document)

    the document



354
355
356
# File 'lib/paru/filter.rb', line 354

def before
  yield @document unless @ran_before
end

#filter(&block) ⇒ JSON

Create a filter using block. In the block you specify selectors and actions to be performed on selected nodes. In the example below, the selector is “Image”, which selects all image nodes. The action is to prepend the contents of the image’s caption by the string “Figure. ”.

Examples:

Add ‘Figure’ to each image’s caption

input = IOString.new(File.read("my_report.md")
output = IOString.new

Paru::Filter.new(input, output).filter do
    with "Image" do |image|
        image.inner_markdown = "Figure. #{image.inner_markdown}"
    end
end

Parameters:

  • block (Proc)

    the filter specification

Returns:

  • (JSON)

    a JSON string with the filtered pandoc AST



296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
# File 'lib/paru/filter.rb', line 296

def filter(&block)
  @selectors = {}
  @filtered_nodes = []
  @document = read_document

  @metadata = PandocFilter::Metadata.new(
    @document.meta,
    treat_metadata_strings_as_plain_strings: @treat_metadata_strings_as_plain_strings
  )

  nodes_to_filter = Enumerator.new do |node_list|
    @document.each_depth_first do |node|
      node_list << node
    end
  end

  @current_node = @document

  @ran_before = false
  @ran_after = false
  instance_eval(&block) # run filter with before block
  @ran_before = true

  nodes_to_filter.each do |node|
    if @current_node.has_been_replaced?
      @current_node = @current_node.get_replacement
      @filtered_nodes.pop
    else
      @current_node = node
    end

    @filtered_nodes.push @current_node

    instance_eval(&block) # run the actual filter code
  end

  @ran_after = true
  instance_eval(&block) # run filter with after block

  write_document
end

#stop!Object

Stop processing the document any further and output it as it is now. This is a great timesaver for filters that only act on a small number of nodes in a large document, or when you only want to set the metadata.

Note, stop will break off the filter immediately after outputting the document in its current state.



373
374
375
376
# File 'lib/paru/filter.rb', line 373

def stop!
  write_document
  exit
end

#with(selector) {|Node| ... } ⇒ Object

Specify what nodes to filter with a selector. If the current_node matches that selector, it is passed to the block to this with method.

Parameters:

  • selector (String)

    a selector string

Yields:

  • (Node)

    the current node if it matches the selector



343
344
345
346
347
348
# File 'lib/paru/filter.rb', line 343

def with(selector)
  return unless @ran_before && !@ran_after

  @selectors[selector] = Selector.new selector unless @selectors.key? selector
  yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes
end