First, I checked this question but the answer refers to an obsolete service.
So is there a web-based (or software, I don't care) that provide searching internet content with regular expression?
Regular expressions are used in search engines, in search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK, and in lexical analysis.
Google wrote, “If you choose the Custom (regex) filter, you can filter by a regular expression (a wildcard match) for the selected item. You can use regular expression, or regex, filters for page URLs and user queries. The RE2 syntax is used.” Regular expressions.
There are three main types of search engines, web crawlers, directories, and sponsored links. Search engines typically use a number of methods to collect and retrieve their results. These include: Crawler databases.
Let me write here an answer from the superuser.com question due to my complete solidarity with the author:
quote from the Ask Metafilter:
The only possible way to make keyword searching efficient over hundreds of terabytes (or whatever their index is up to these days) is to precompute an index of words.
In fact a full regex engine is turing-complete, and you can write arbitrary regexps that will gobble up near infinite amounts of CPU time and memory. For all these reasons it would be technical insanity for them to offer regex searching to the general public.
Update: as it rightfully pointed out, regexp is not Turing Complete. Stay tuned for the more detailed answer:
TBD...
dayyan is correct, it's reverse indexes which make search engines fast; there's no way to accelerate regex search over a petabyte of content if you only have 100 terabytes of flash disk. Keyword searches, reverse index, no problem.
blekko's web grep (https://blekko.com/ws/+/webgrep) supports regexes, but most of the searches we get for it are for constant strings, usually which are in the HTML, because that's what's interesting: who uses microformats? who uses various javascript libraries? who uses various comment systems? And so forth.
If you sent us a regex, we'd be happy to run it for you.
Running these searches consists of a MapReduce job run over all the html in our crawl. That's why it takes a while (a day or two) to get an answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With