Class: Paru::Filter
- Inherits:
-
Object
- Object
- Paru::Filter
- Defined in:
- lib/paru/filter.rb
Overview
Filter is used to write your own pandoc filter in Ruby. A Filter is almost always created and immediately executed via the run
method. The most simple filter you can write in paru is the so-called “identity”:
#!/usr/bin/env ruby
# Identity filter
require "paru/filter"
Paru::Filter.run do
# nothing
end
It runs the filter, but it makes no selection nor performs an action. This is pretty useless, of course—although it makes for a great way to test the filter functionality—, but it shows the general setup of a filter well.
Writing a simple filter: numbering figures
Inside a Filter.run block, you specify selectors with actions. For example, to number all figures in a document and prefix their captions with “Figure”, the following filter would work:
#!/usr/bin/env ruby
# Number all figures in a document and prefix the caption with "Figure".
require "paru/filter"
figure_counter = 0;
Paru::Filter.run do
with "Image" do |image|
figure_counter += 1
image.inner_markdown = "Figure #{figure_counter}. #{image.inner_markdown}"
end
end
This filter selects all PandocFilter::Image nodes. For each PandocFilter::Image node it increments the figure counter figure_counter
and then sets the figure's caption to “Figure” followed by the figure count and the original caption. In other words, the following input document
![My first image](img/horse.png)
![My second image](img/rabbit.jpeg)
will be transformed into
![Figure 1. My first image](img/horse.png)
![Figure 2. My second image](img/rabbit.jpeg)
The method PandocFilter::InnerMarkdown#inner_markdown and its counterpart PandocFilter::Node#markdown are a great way to manipulate the contents of a selected PandocFilter::Node. No messing about creating and filling PandocFilter::Nodes, you can just use pandoc's own markdown format!
Writing a more involved filters
Using the “follows” selector: Numbering figures and chapters
The previous example can be extended to also number chapters and to start numbering figures anew per chapter. As you would expect, we need two counters, one for the figures and one for the chapters:
#!/usr/bin/env ruby
# Number figures per chapter; number chapters as well
require "paru/filter"
current_chapter = 0
current_figure = 0;
Paru::Filter.run do
with "Header" do |header|
if header.level == 1
current_chapter += 1
current_figure = 0
header.inner_markdown = "Chapter #{current_chapter}. #{header.inner_markdown}"
end
end
with "Header + Image" do |image|
current_figure += 1
image.inner_markdown = "Figure #{current_chapter}.#{current_figure}. #{image.inner_markdown}"
end
end
What is new in this filter, however, is the selector “Header + Image” which selects all PandocFilter::Image nodes that follow a PandocFilter::Header node. Documents in pandoc have a flat structure where chapters do not exists as separate concepts. Instead, a chapter is implied by a header of a certain level and everything that follows until the next header of that level.
Using the “child of” selector: Annotate custom blocks
Hierarchical structures do exist in a pandoc document, however. For example, the contents of a paragraph (PandocFilter::Para), which itself is a PandocFilter::Block level node, are PandocFilter::Inline level nodes. Another example are custom block or PandocFilter::Div nodes. You select a child node by using the > selector as in the example below:
#!/usr/bin/env ruby
# Annotate custom blocks: example blocks and important blocks
require "paru/filter"
example_count = 0
Paru::Filter.run do
with "Div.example > Header" do |header|
if header.level == 3
example_count += 1
header.inner_markdown = "Example #{example_count}: #{header.inner_markdown}"
end
end
with "Div.important" do |d|
d.inner_markdown = d.inner_markdown + "\n\n*(important)*"
end
end
Here all PandocFilter::Header nodes that are inside a PandocFilter::Div node are selected. Furthermore, if these headers are of level 3, they are prefixed by the string “Example” followed by a count.
In this example, “important” PandocFilter::Div nodes are annotated by putting the string important before the contents of the node.
Using a distance in a selector: Capitalize the first N characters of
a paragraph
Given the flat structure of a pandoc document, the “follows” selector has quite a reach. For example, “Header + Para” selects all paragraphs that follow a header. In most well-structured documents, this would select basically all paragraphs.
But what if you need to be more specific? For example, if you would like to capitalize the first sentence of each first paragraph of a chapter, you need a way to specify a sequence number of sorts. To that end, paru filter selectors take an optional distance parameter. A filter for this example could look like:
#!/usr/bin/env ruby
# Capitalize the first N characters of a paragraph
require "paru/filter"
END_CAPITAL = 10
Paru::Filter.run do
with "Header +1 Para" do |p|
text = p.inner_markdown
first_line = text.slice(0, END_CAPITAL).upcase
rest = text.slice(END_CAPITAL, text.size)
p.inner_markdown = first_line + rest
end
end
The distance is denoted after a selector by an integer. In this case “Header +1 Para” selects all PandocFilter::Para nodes that directly follow an PandocFilter::Header node. You can use a distance with any selector.
Manipulating nodes: Removing horizontal lines
Although the PandocFilter::InnerMarkdown#inner_markdown and PandocFilter::Node#markdown work in most situations, sometimes direct manipulation of the pandoc document AST is useful. These PandocFilter::ASTManipulation methods are mixed in PandocFilter::Node and can be used on any node in your filter. For example, to delete all PandocFilter::HorizontalRule nodes, can use a filter like:
#!/usr/bin/env ruby
require "paru/filter"
Paru::Filter.run do
with "HorizontalRule" do |rule|
if rule.has_parent? then
rule.parent.delete rule
end
end
end
Note that you could have arrived at the same effect by using:
rule.markdown = ""
Manipulating metadata:
One of the interesting features of the pandoc markdown format is the ability to add metadata to a document via a YAML block or command line options. For example, if you use a template that uses the metadata property $date$ to write a date on a title page, it is quite useful to automatically add the date of today to the metadata. You can do so with a filter like:
#!/usr/bin/env ruby
## Add today's date to the metadata
require "paru/filter"
require "date"
Paru::Filter.run do
before do
['date'] = Date.today.to_s
end
end
In a filter, the metadata
property is a Ruby Hash of Strings, Numbers, Booleans, Arrays, and Hashes. You can manipulate it like any other Ruby Hash.
Instance Attribute Summary collapse
-
#current_node ⇒ Node
The node in the AST of the document being filtered that is currently being inspected by the filter.
-
#document ⇒ Document
The document being filtered.
-
#metadata ⇒ Hash
The metadata of the document being filtered as a Ruby Hash.
Class Method Summary collapse
-
.run(&block) ⇒ Object
Run the filter specified by block.
Instance Method Summary collapse
-
#after {|Document| ... } ⇒ Object
After running the filter on all nodes, the
document
is passed to the block to thisafter
method. -
#before {|Document| ... } ⇒ Object
Before running the filter on all nodes, the
document
is passed to the block to thisbefore
method. -
#filter(&block) ⇒ JSON
Create a filter using
block
. -
#initialize(input = $stdin, output = $stdout) ⇒ Filter
constructor
Create a new Filter instance.
-
#stop! ⇒ Object
Stop processing the document any further and output it as it is now.
-
#with(selector) {|Node| ... } ⇒ Object
Specify what nodes to filter with a
selector
.
Constructor Details
#initialize(input = $stdin, output = $stdout) ⇒ Filter
Create a new Filter instance. For convenience, run creates a new Paru::Filter and runs it immediately. Use this constructor if you want to run a filter on different input and output streams that STDIN and STDOUT respectively.
234 235 236 237 |
# File 'lib/paru/filter.rb', line 234 def initialize(input = $stdin, output = $stdout) @input = input @output = output end |
Instance Attribute Details
#current_node ⇒ Node
Returns The node in the AST of the document being filtered that is currently being inspected by the filter.
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
# File 'lib/paru/filter.rb', line 221 class Filter attr_reader :metadata, :document, :current_node # Create a new Filter instance. For convenience, {run} creates a new # {Filter} and runs it immediately. Use this constructor if you want # to run a filter on different input and output streams that STDIN and # STDOUT respectively. # # @param input [IO = $stdin] the input stream to read, defaults to # STDIN # @param output [IO = $stdout] the output stream to write, defaults to # STDOUT def initialize(input = $stdin, output = $stdout) @input = input @output = output end # Run the filter specified by block. This is a convenience method that # creates a new {Filter} using input stream STDIN and output stream # STDOUT and immediately runs {filter} with the block supplied. # # @param block [Proc] the filter specification # # @example Add 'Figure' to each image's caption # Paru::Filter.run do # with "Image" do |image| # image.inner_markdown = "Figure. #{image.inner_markdown}" # end # end def self.run(&block) Filter.new($stdin, $stdout).filter(&block) end # Create a filter using +block+. In the block you specify # selectors and actions to be performed on selected nodes. In the # example below, the selector is "Image", which selects all image # nodes. The action is to prepend the contents of the image's caption # by the string "Figure. ". # # @param block [Proc] the filter specification # # @return [JSON] a JSON string with the filtered pandoc AST # # @example Add 'Figure' to each image's caption # input = IOString.new(File.read("my_report.md") # output = IOString.new # # Paru::Filter.new(input, output).filter do # with "Image" do |image| # image.inner_markdown = "Figure. #{image.inner_markdown}" # end # end # def filter(&block) @selectors = Hash.new @filtered_nodes = [] @document = read_document @metadata = PandocFilter::Metadata.new @document. nodes_to_filter = Enumerator.new do |node_list| @document.each_depth_first do |node| node_list << node end end @current_node = @document @ran_before = false @ran_after = false instance_eval(&block) # run filter with before block @ran_before = true nodes_to_filter.each do |node| if @current_node.has_been_replaced? @current_node = @current_node.get_replacement @filtered_nodes.pop else @current_node = node end @filtered_nodes.push @current_node instance_eval(&block) # run the actual filter code end @ran_after = true instance_eval(&block) # run filter with after block write_document end # Specify what nodes to filter with a +selector+. If the +current_node+ # matches that selector, it is passed to the block to this +with+ method. # # @param selector [String] a selector string # @yield [Node] the current node if it matches the selector def with(selector) if @ran_before and !@ran_after @selectors[selector] = Selector.new selector unless @selectors.has_key? selector yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes end end # Before running the filter on all nodes, the +document+ is passed to # the block to this +before+ method. This method is run exactly once. # # @yield [Document] the document def before() yield @document unless @ran_before end # After running the filter on all nodes, the +document+ is passed to # the block to this +after+ method. This method is run exactly once. # # @yield [Document] the document def after() yield @document if @ran_after end # Stop processing the document any further and output it as it is now. # This is a great timesaver for filters that only act on a small # number of nodes in a large document, or when you only want to set # the metadata. # # Note, stop will break off the filter immediately after outputting # the document in its current state. def stop!() write_document exit true end private # The Document node from JSON formatted pandoc document structure # on STDIN that is being filtered # # @return [Document] create a new Document node from a pandoc AST from # JSON from STDIN def read_document() PandocFilter::Document.from_JSON @input.read end # Write the document being filtered to STDOUT def write_document() @document. = @metadata. @output.write @document.to_JSON end end |
#document ⇒ Document
Returns The document being filtered.
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
# File 'lib/paru/filter.rb', line 221 class Filter attr_reader :metadata, :document, :current_node # Create a new Filter instance. For convenience, {run} creates a new # {Filter} and runs it immediately. Use this constructor if you want # to run a filter on different input and output streams that STDIN and # STDOUT respectively. # # @param input [IO = $stdin] the input stream to read, defaults to # STDIN # @param output [IO = $stdout] the output stream to write, defaults to # STDOUT def initialize(input = $stdin, output = $stdout) @input = input @output = output end # Run the filter specified by block. This is a convenience method that # creates a new {Filter} using input stream STDIN and output stream # STDOUT and immediately runs {filter} with the block supplied. # # @param block [Proc] the filter specification # # @example Add 'Figure' to each image's caption # Paru::Filter.run do # with "Image" do |image| # image.inner_markdown = "Figure. #{image.inner_markdown}" # end # end def self.run(&block) Filter.new($stdin, $stdout).filter(&block) end # Create a filter using +block+. In the block you specify # selectors and actions to be performed on selected nodes. In the # example below, the selector is "Image", which selects all image # nodes. The action is to prepend the contents of the image's caption # by the string "Figure. ". # # @param block [Proc] the filter specification # # @return [JSON] a JSON string with the filtered pandoc AST # # @example Add 'Figure' to each image's caption # input = IOString.new(File.read("my_report.md") # output = IOString.new # # Paru::Filter.new(input, output).filter do # with "Image" do |image| # image.inner_markdown = "Figure. #{image.inner_markdown}" # end # end # def filter(&block) @selectors = Hash.new @filtered_nodes = [] @document = read_document @metadata = PandocFilter::Metadata.new @document. nodes_to_filter = Enumerator.new do |node_list| @document.each_depth_first do |node| node_list << node end end @current_node = @document @ran_before = false @ran_after = false instance_eval(&block) # run filter with before block @ran_before = true nodes_to_filter.each do |node| if @current_node.has_been_replaced? @current_node = @current_node.get_replacement @filtered_nodes.pop else @current_node = node end @filtered_nodes.push @current_node instance_eval(&block) # run the actual filter code end @ran_after = true instance_eval(&block) # run filter with after block write_document end # Specify what nodes to filter with a +selector+. If the +current_node+ # matches that selector, it is passed to the block to this +with+ method. # # @param selector [String] a selector string # @yield [Node] the current node if it matches the selector def with(selector) if @ran_before and !@ran_after @selectors[selector] = Selector.new selector unless @selectors.has_key? selector yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes end end # Before running the filter on all nodes, the +document+ is passed to # the block to this +before+ method. This method is run exactly once. # # @yield [Document] the document def before() yield @document unless @ran_before end # After running the filter on all nodes, the +document+ is passed to # the block to this +after+ method. This method is run exactly once. # # @yield [Document] the document def after() yield @document if @ran_after end # Stop processing the document any further and output it as it is now. # This is a great timesaver for filters that only act on a small # number of nodes in a large document, or when you only want to set # the metadata. # # Note, stop will break off the filter immediately after outputting # the document in its current state. def stop!() write_document exit true end private # The Document node from JSON formatted pandoc document structure # on STDIN that is being filtered # # @return [Document] create a new Document node from a pandoc AST from # JSON from STDIN def read_document() PandocFilter::Document.from_JSON @input.read end # Write the document being filtered to STDOUT def write_document() @document. = @metadata. @output.write @document.to_JSON end end |
#metadata ⇒ Hash
Returns The metadata of the document being filtered as a Ruby Hash.
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
# File 'lib/paru/filter.rb', line 221 class Filter attr_reader :metadata, :document, :current_node # Create a new Filter instance. For convenience, {run} creates a new # {Filter} and runs it immediately. Use this constructor if you want # to run a filter on different input and output streams that STDIN and # STDOUT respectively. # # @param input [IO = $stdin] the input stream to read, defaults to # STDIN # @param output [IO = $stdout] the output stream to write, defaults to # STDOUT def initialize(input = $stdin, output = $stdout) @input = input @output = output end # Run the filter specified by block. This is a convenience method that # creates a new {Filter} using input stream STDIN and output stream # STDOUT and immediately runs {filter} with the block supplied. # # @param block [Proc] the filter specification # # @example Add 'Figure' to each image's caption # Paru::Filter.run do # with "Image" do |image| # image.inner_markdown = "Figure. #{image.inner_markdown}" # end # end def self.run(&block) Filter.new($stdin, $stdout).filter(&block) end # Create a filter using +block+. In the block you specify # selectors and actions to be performed on selected nodes. In the # example below, the selector is "Image", which selects all image # nodes. The action is to prepend the contents of the image's caption # by the string "Figure. ". # # @param block [Proc] the filter specification # # @return [JSON] a JSON string with the filtered pandoc AST # # @example Add 'Figure' to each image's caption # input = IOString.new(File.read("my_report.md") # output = IOString.new # # Paru::Filter.new(input, output).filter do # with "Image" do |image| # image.inner_markdown = "Figure. #{image.inner_markdown}" # end # end # def filter(&block) @selectors = Hash.new @filtered_nodes = [] @document = read_document @metadata = PandocFilter::Metadata.new @document. nodes_to_filter = Enumerator.new do |node_list| @document.each_depth_first do |node| node_list << node end end @current_node = @document @ran_before = false @ran_after = false instance_eval(&block) # run filter with before block @ran_before = true nodes_to_filter.each do |node| if @current_node.has_been_replaced? @current_node = @current_node.get_replacement @filtered_nodes.pop else @current_node = node end @filtered_nodes.push @current_node instance_eval(&block) # run the actual filter code end @ran_after = true instance_eval(&block) # run filter with after block write_document end # Specify what nodes to filter with a +selector+. If the +current_node+ # matches that selector, it is passed to the block to this +with+ method. # # @param selector [String] a selector string # @yield [Node] the current node if it matches the selector def with(selector) if @ran_before and !@ran_after @selectors[selector] = Selector.new selector unless @selectors.has_key? selector yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes end end # Before running the filter on all nodes, the +document+ is passed to # the block to this +before+ method. This method is run exactly once. # # @yield [Document] the document def before() yield @document unless @ran_before end # After running the filter on all nodes, the +document+ is passed to # the block to this +after+ method. This method is run exactly once. # # @yield [Document] the document def after() yield @document if @ran_after end # Stop processing the document any further and output it as it is now. # This is a great timesaver for filters that only act on a small # number of nodes in a large document, or when you only want to set # the metadata. # # Note, stop will break off the filter immediately after outputting # the document in its current state. def stop!() write_document exit true end private # The Document node from JSON formatted pandoc document structure # on STDIN that is being filtered # # @return [Document] create a new Document node from a pandoc AST from # JSON from STDIN def read_document() PandocFilter::Document.from_JSON @input.read end # Write the document being filtered to STDOUT def write_document() @document. = @metadata. @output.write @document.to_JSON end end |
Class Method Details
.run(&block) ⇒ Object
Run the filter specified by block. This is a convenience method that creates a new Paru::Filter using input stream STDIN and output stream STDOUT and immediately runs #filter with the block supplied.
251 252 253 |
# File 'lib/paru/filter.rb', line 251 def self.run(&block) Filter.new($stdin, $stdout).filter(&block) end |
Instance Method Details
#after {|Document| ... } ⇒ Object
After running the filter on all nodes, the document
is passed to the block to this after
method. This method is run exactly once.
338 339 340 |
# File 'lib/paru/filter.rb', line 338 def after() yield @document if @ran_after end |
#before {|Document| ... } ⇒ Object
Before running the filter on all nodes, the document
is passed to the block to this before
method. This method is run exactly once.
330 331 332 |
# File 'lib/paru/filter.rb', line 330 def before() yield @document unless @ran_before end |
#filter(&block) ⇒ JSON
Create a filter using block
. In the block you specify selectors and actions to be performed on selected nodes. In the example below, the selector is “Image”, which selects all image nodes. The action is to prepend the contents of the image's caption by the string “Figure. ”.
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
# File 'lib/paru/filter.rb', line 275 def filter(&block) @selectors = Hash.new @filtered_nodes = [] @document = read_document @metadata = PandocFilter::Metadata.new @document. nodes_to_filter = Enumerator.new do |node_list| @document.each_depth_first do |node| node_list << node end end @current_node = @document @ran_before = false @ran_after = false instance_eval(&block) # run filter with before block @ran_before = true nodes_to_filter.each do |node| if @current_node.has_been_replaced? @current_node = @current_node.get_replacement @filtered_nodes.pop else @current_node = node end @filtered_nodes.push @current_node instance_eval(&block) # run the actual filter code end @ran_after = true instance_eval(&block) # run filter with after block write_document end |
#stop! ⇒ Object
Stop processing the document any further and output it as it is now. This is a great timesaver for filters that only act on a small number of nodes in a large document, or when you only want to set the metadata.
Note, stop will break off the filter immediately after outputting the document in its current state.
349 350 351 352 |
# File 'lib/paru/filter.rb', line 349 def stop!() write_document exit true end |
#with(selector) {|Node| ... } ⇒ Object
Specify what nodes to filter with a selector
. If the current_node
matches that selector, it is passed to the block to this with
method.
319 320 321 322 323 324 |
# File 'lib/paru/filter.rb', line 319 def with(selector) if @ran_before and !@ran_after @selectors[selector] = Selector.new selector unless @selectors.has_key? selector yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes end end |