In a text file I have data in the form:
1)
text
text
2)
more text
3)
even more text
more even text
even more text
...
I read it as a list of Strings using the following:
val input = io.Source.fromFile("filename.txt").getLines().toList
I want to break the list down into sub-lists starting with 1)
, 2)
etc.
I've come up with:
val subLists =
input.foldRight( List(List[String]()) ) {
(x, acc) =>
if (x.matches("""[0-9]+\)""")) List() :: (x :: acc.head) :: acc.tail
else (x :: acc.head) :: acc.tail
}.tail
Can this be achieved more simply? What would be really nice would be if there were a built-in method to split a collection on every element that satisfies a predicate (hint, hint, library designers :)).
foldRight
with a complicated argument is usually an indication that you might as well write this using recursion, and factor it out to its own method, while you are at it. Here's what I came up with. First, let's generalize
to a generic method, groupPrefix:
/** Returns shortest possible list of lists xss such that
* - xss.flatten == xs
* - No sublist in xss contains an element matching p in its tail
*/
def groupPrefix[T](xs: List[T])(p: T => Boolean): List[List[T]] = xs match {
case List() => List()
case x :: xs1 =>
val (ys, zs) = xs1 span (!p(_))
(x :: ys) :: groupPrefix(zs)(p)
}
Now you get the result simply by calling
groupPrefix(input)(_ matches """\d+\)""")
I have the honor, to add an answer next to the great @MartinOdersky!
From Scala 2.13 we can use the List.unfold
:
List.unfold(input) {
case Nil =>
None
case x :: as =>
as.span(!_.matches("""\d+\)""")) match {
case (prefix, Nil) =>
Some(x :: prefix, List.empty)
case (prefix, suffix) =>
Some(x :: prefix, suffix)
}
}
Code run at Scastie.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With