Web Scraping With Haskell

Tags:

What is the current state of libraries for scraping websites with Haskell?

I'm trying to make myself do more of my quick oneoff tasks in Haskell, in order to help increase my comfort level with the language.

In Python, I tend to use the excellent PyQuery library for this. Is there something similarly simple and easy in Haskell? I've looked into Tag Soup, and while the parser itself seems nice, actually traversing pages doesn't seem as nice as it is in other languages.

Is there a better option out there?

844

asked Jan 29 '11 17:01

ricree

2 Answers

http://hackage.haskell.org/package/shpider

Shpider is a web automation library for Haskell. It allows you to quickly write crawlers, and for simple cases ( like following links ) even without reading the page source.

It has useful features such as turning relative links from a page into absolute links, options to authorize transactions only on a given domain, and the option to only download html documents.

It also provides a nice syntax for filling out forms.

An example:

 runShpider $ do       download "http://apage.com"       theForm : _ <- getFormsByAction "http://anotherpage.com"       sendForm $ fillOutForm theForm $ pairs $ do             "occupation" =: "unemployed Haskell programmer"             "location" =: "mother's house"

(Edit in 2018 -- shpider is deprecated, these days https://hackage.haskell.org/package/scalpel might be a good replacement)

138

answered Oct 02 '22 23:10

sclv

From my searching on the Haskell mailing lists, it appears that TagSoup is the dominant choice for parsing pages. For example: http://www.haskell.org/pipermail/haskell-cafe/2008-August/045721.html

As far as the other aspects of web scraping (such as crawling, spidering, and caching), I searched http://hackage.haskell.org/package/ for those keywords but didn't find anything promising. I even skimmed through packages mentioning "http" but nothing jumped out at me.

Note: I'm not a regular Haskeller, so I hope others can chime in if I missed something.

answered Oct 03 '22 01:10

David J.

Related questions
                            
                                Java tagged union / sum types
                            
                                What does (f .) . g mean in Haskell?
                            
                                Real world Haskell programming [closed]
                            
                                Y Combinator in Haskell
                            
                                Is it possible to generate comments to functions in Template Haskell?
                            
                                What are Alternative's "some" and "many" useful for?
                            
                                What does :: (double colon) stand for?
                            
                                What is a monad in FP, in categorical terms?
                            
                                What's the closest thing to Haskell's typeclasses in OCaml?
                            
                                Can I define the Negatable interface in Java?
                            
                                Simplest non-trivial monad transformer example for "dummies", IO+Maybe
                            
                                What is a "spark" in Haskell
                            
                                What does the : infix operator do in Haskell?
                            
                                Can you overload + in haskell?
                            
                                Why is the F# version of this program 6x faster than the Haskell one?
                            
                                Excel Automation with Haskell gives a seg fault
                            
                                values, types, kinds,... as an infinite sequence?
                            
                                Monad Transformers vs Passing parameters to functions
                            
                                Simple haskell unit testing
                            
                                Testing IO actions with Monadic QuickCheck

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Web Scraping With Haskell

Tags:

html-parsing

haskell

web-scraping

ricree

People also ask

2 Answers

sclv

David J.

Recent Activity

Donate For Us