I'm learning F# and I've started to play around with both sequences and match
expressions.
I'm writing a web scraper that's looking through HTML similar to the following and taking the last URL in a parent <span>
with the paging
class.
<html>
<body>
<span class="paging">
<a href="http://google.com">Link to Google</a>
<a href="http://TheLinkIWant.com">The Link I want</a>
</span>
</body>
</html>
My attempt to get the last URL is as follows:
type AnHtmlPage = FSharp.Data.HtmlProvider<"http://somesite.com">
let findMaxPageNumber (page:AnHtmlPage)=
page.Html.Descendants()
|> Seq.filter(fun n -> n.HasClass("paging"))
|> Seq.collect(fun n -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
However I'm running into issues when the class I'm searching for is absent from the page. In particular I get ArgumentExceptions with the message: Additional information: The input sequence was empty.
My first thought was to build another function that matched empty sequences and returned an empty string when the paging
class wasn't found on a page.
let findUrlOrReturnEmptyString (span:seq<HtmlNode>) =
match span with
| Seq.empty -> String.Empty // <----- This is invalid
| span -> span
|> Seq.collect(fun (n:HtmlNode) -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
let findMaxPageNumber (page:AnHtmlPage)=
page.Html.Descendants()
|> Seq.filter(fun n -> n.HasClass("paging"))
|> findUrlOrReturnEmptyStrin
My issue is now that Seq.Empty
is not a literal and cannot be used in a pattern. Most examples with pattern matching specify empty lists []
in their patterns so I'm wondering: How can I use a similar approach and match empty sequences?
Use a guard clause
match myseq with
| s when Seq.isEmpty s -> "empty"
| _ -> "not empty"
You can use a when
guard to further qualify the case:
match span with
| sequence when Seq.isEmpty sequence -> String.Empty
| span -> span
|> Seq.collect (fun (n: HtmlNode) ->
n.Descendants()
|> Seq.filter (fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
ildjarn is correct in that in this case, an if...then...else
may be the more readable alternative, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With