Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently search for a single element in a large Pandoc

Tags:

haskell

pandoc

Unless I'm missing something it seems that there are only two ways to "traverse" a Pandoc data-structure:

  1. Manually pattern-matching on Block and Inline constructors
  2. Via the Walkable type-class and related utility function

Using the Walkable type-class, is there an efficient way to search for the first matching element (preferably in a breadth-first manner), and stop the traversal as soon as its found? It seems to me that all functions around the Walkable type-class are going to traverse the entire data structure no matter what.

If not, I guess the only way is to pattern-match the Block and Inline constructors and build this on my own.

like image 758
Saurabh Nanda Avatar asked Aug 26 '20 16:08

Saurabh Nanda


2 Answers

The other answer points out the useful query function. I'd add that there's a package of pandoc lenses. You asked about breadth-first traversal too, so here's both.

import Data.Semigroup (First (..))

dfsFirstLink :: Walkable Inline b => b -> Maybe Text
dfsFirstLink = fmap getFirst . query (preview $ _Link . _2 . _1 . to First) 

bfsFirstLink :: Walkable Inline b => b -> Maybe Text
bfsFirstLink = fmap getFirst . getConst . traverseOf (levels query . folded) (Const . preview (_Link . _2 . _1 . to First))

-- Construct a walkable value where dfs != bfs
p :: Pandoc
p = Pandoc mempty [Plain [Note [Plain [Link mempty [] ("a","b")]]],Plain [Link mempty [] ("c","d")]]

>> dfsFirstLink p
Just "a"
>> bfsFirstLink p
Just "c"

Though unfortunately some ad-hoc experiments suggest it may not be as lazy as one might hope.

like image 117
moonGoose Avatar answered Sep 21 '22 15:09

moonGoose


The Walkable typeclass contains a function called query with the following type signature:

query :: Monoid c => (a -> c) -> b -> c

In Data.Semigroup, there's a type called First, with a semigroup instance where the accumulating behavior is to return the "leftmost value".

This can be combined with the Monoid on Maybe, which turns any Semigroup into a Monoid with mempty of Nothing, to give the behavior you want.

For example, adapting a function from Inline -> Maybe String, to Pandoc -> Maybe String, can be done like so:

import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Semigroup

findUrl :: Inline -> Maybe String
findUrl (Link _ _ target) = Just $ fst target
findUrl _ = Nothing

findFirstUrl :: Pandoc -> Maybe String
findFirstUrl = (fmap getFirst) . (query findUrl')
  where
    findUrl' :: Inline -> Maybe (First String)
    findUrl' = (fmap First) . findUrl

With regards to your concern that this will traverse the entire data structure: Haskell is lazy; it shouldn't traverse any further than it needs to.

As pointed out in the comments, it's also possible to write this by specializing query to the List Monoid:

import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Maybe (listToMaybe)

findUrl :: Inline -> [String]
findUrl (Link _ _ target) = [fst target]
findUrl _ = []

findFirstUrl :: Pandoc -> Maybe String
findFirstUrl = listToMaybe . (query findUrl)
like image 39
Joe Avatar answered Sep 23 '22 15:09

Joe