Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a search engine that support regular expression search? [closed]

First, I checked this question but the answer refers to an obsolete service.

So is there a web-based (or software, I don't care) that provide searching internet content with regular expression?

like image 432
ilyes kooli Avatar asked Jun 20 '12 11:06

ilyes kooli


People also ask

Are regular expressions used in search engines?

Regular expressions are used in search engines, in search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK, and in lexical analysis.

Does Google support regular expressions?

Google wrote, “If you choose the Custom (regex) filter, you can filter by a regular expression (a wildcard match) for the selected item. You can use regular expression, or regex, filters for page URLs and user queries. The RE2 syntax is used.” Regular expressions.

What are the 3 types of search engines?

There are three main types of search engines, web crawlers, directories, and sponsored links. Search engines typically use a number of methods to collect and retrieve their results. These include: Crawler databases.


2 Answers

Let me write here an answer from the superuser.com question due to my complete solidarity with the author:

quote from the Ask Metafilter:

The only possible way to make keyword searching efficient over hundreds of terabytes (or whatever their index is up to these days) is to precompute an index of words.

In fact a full regex engine is turing-complete, and you can write arbitrary regexps that will gobble up near infinite amounts of CPU time and memory. For all these reasons it would be technical insanity for them to offer regex searching to the general public.

Update: as it rightfully pointed out, regexp is not Turing Complete. Stay tuned for the more detailed answer:

TBD...

like image 181
Sergei Danielian Avatar answered Sep 22 '22 19:09

Sergei Danielian


dayyan is correct, it's reverse indexes which make search engines fast; there's no way to accelerate regex search over a petabyte of content if you only have 100 terabytes of flash disk. Keyword searches, reverse index, no problem.

blekko's web grep (https://blekko.com/ws/+/webgrep) supports regexes, but most of the searches we get for it are for constant strings, usually which are in the HTML, because that's what's interesting: who uses microformats? who uses various javascript libraries? who uses various comment systems? And so forth.

If you sent us a regex, we'd be happy to run it for you.

Running these searches consists of a MapReduce job run over all the html in our crawl. That's why it takes a while (a day or two) to get an answer.

like image 28
Greg Lindahl Avatar answered Sep 18 '22 19:09

Greg Lindahl