Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web scraping in Objective C

Tags:

objective-c

is there any Objective C library for parsing HTML, like python's BeautifulSoup? Thanks

like image 943
pistacchio Avatar asked May 03 '11 16:05

pistacchio


2 Answers

From Apple's part there is NSXMLDocument and NSXMLParser, which support tidied HTML input. (Tree-Based XML Programming Guide)

On iOS (4.3) there's currently no NSXMLDocument available, so you'd have to use either NSXMLParser or libxml2.2.

Some more informations on potential problems with parsing malformed HTML:
What's the best approach for parsing XML/'screen scraping' in iOS? UIWebview or NSXMLParser?

The most reliable solution is to use an off-screen WebView, load the HTML source into it and then access its DOM tree.

like image 58
Regexident Avatar answered Nov 15 '22 16:11

Regexident


The best way I have found is NSXMLParser + libtidy. However, there are many third party libraries are available now which makes parsing easier. (last answer was written in 2011).

  • Google's Gumbo HTML5 parser is pretty good. It's written in pure C99 and you can use it with Objective C (use a wrapper like this one).
  • If you want pure Objective C libraries then Ono or hpple are good. HTMLReader is also a good alternative.
  • If Swift is your thing, you could use NDHpple which is a swift wrapper based on hpple. Or You could use Swift-HTML-Parser. (Bonus: Alamofire is as good as Python Requests and is a joy to use)
like image 23
avi Avatar answered Nov 15 '22 15:11

avi