Is there a search engine, that would allow me to search by a regular expression?

Google Code Search allows you to search using a regular expression. As far as I am aware no such search engine exists for general searches.

regular expression search engine [closed]

3 Answers

Google Code Search allows you to search using a regular expression.

As far as I am aware no such search engine exists for general searches.

155

answered Oct 06 '22 19:10

Mark Byers

There are a few problems with regular expressions that current prohibit employing these in real-world scenarios. The most pressing would be that the entire cached Internet would have to be matched with your regex, which would take significant computing resources; indexes are pretty much useless in regex context it seems, due to regexes being potentially unbound (/fo*bar/).

answered Oct 06 '22 18:10

user502515

I don't have a specific engine to suggest.

However, if you could live with a subset of regex syntax, a search engine could store additional tokens to efficiently match rather complex expressions. Solr/Lucene allows for custom tokenization, where the same word can generate multiple tokens and with various rule sets.

I'll use my name as an example: "Mark marks the spot."

Case insensitive with stemming: (mark, mark, spot)

Case sensitive with no stemming: (Mark, marks, spot)

Case sensitive with NLP thesaurus expansion: ( [Mark, Marc], [mark, indicate, to-point], [spot, position, location, beacon, coordinate] )

And now evolving towards your question, case insensitive, stemming, dedupe, autocomplete prefix matching: ( [m, ma, mar, mark], [s, sp, spo, spot] )

And if you wanted "substring" style matching it would be: ( [m, ma, mar, mark, a, ar, ark, r, rk, k], [s, sp, spo, spot, p, po, pot, o, ot, t] )

A single search Index contain all of these different forms of tokens, and choose which ones to use for each type of search.

Let's try the word "Missippi" with a regex style with literal tokens: [ m, m?, m+, i, i?, i+, s, ss, s+, ss+ ... ] etc.

The actual rules would depend on the regex subset, but hopefully the pattern is becoming clearer. You would extend even further to match other regex fragments, and then use a form of phrase searching to locate matches.

Of course the index would be quite large, BUT it might be worth it, depending on the project's requirements. And you'd also need a query parser and application logic.

I realize if you're looking for a canned engine this doesn't do it, but in terms of theory this is how I'd approach it (assuming it's really a requirement!). If all somebody wanted was substring matching and flexible wildcard matching, you could get away with far fewer tokens in the index.

In terms of canned apps, you might check out OpenGrok, used for source code indexing, which is not full regex, but understands source code pretty well.

answered Oct 06 '22 20:10

Mark Bennett

Related questions
                            
                                grep lines that start with a specific string
                            
                                How to overcome the lack of Perl's \G in JavaScript code?
                            
                                javascript regular expression to find single or double quote not working
                            
                                beautiful soup regex
                            
                                Matching a whole word with leading or trailing special symbols like dollar in a string
                            
                                Regex capture group ( ) within character set [ ]
                            
                                How to permutate a string by varying one character in JavaScript?
                            
                                Is there a regular expression which matches a single grapheme cluster?
                            
                                Ignoring cookies list efficiently in NGINX reverse proxy setup
                            
                                Combining state and token throws. Why?
                            
                                How to parse a Module name canonically
                            
                                regular expressions - match all anchors with optional attributes
                            
                                Use Oracle INSTR function to search for multiple strings
                            
                                Regular expression to match non-negative integers in PHP?
                            
                                regex matches with intersection in C#
                            
                                Javascript regex URL matching
                            
                                What is the Java 1.4.2 equivalent of Pattern.quote()
                            
                                What is a "tagged DFA"?
                            
                                Regex to remove xml declaration from a string
                            
                                Vim: when matching a string across multiple lines using \_. in regex, the :yank command only works for the first line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

regular expression search engine [closed]

Tags:

regex

search-engine

Elwhis

People also ask

3 Answers

Mark Byers

user502515

Mark Bennett

Recent Activity

Donate For Us