Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithms recognizing physical address on a webpage

What are the best algorithms for recognizing structured data on an HTML page?

For example Google will recognize the address of home/company in an email, and offers a map to this address.

like image 275
gyurisc Avatar asked Dec 08 '08 09:12

gyurisc


1 Answers

A named-entity extraction framework such as GATE has at least tackled the information extraction problem for locations, assisted by a gazetteer of known places to help resolve common issues. Unless the pages were machine generated from a common source, you're going to find regular expressions a bit weak for the job.

like image 55
John with waffle Avatar answered Nov 15 '22 11:11

John with waffle