I could not find a perfect solution either in Google or ES for the following situation, hope someone could help here. Suppose there are five email addresses stored under field "email": <pre class="prettyprint"><code>1. {"email": "john.doe@gmail.com"} 2. {"email": "john.doe@gmail.com, john.doe@outlook.com"} 3. {"email": "hello-john.doe@outlook.com"} 4. {"email": "john.doe@outlook.com} 5. {"email": "john@yahoo.com"} </code></pre> I want to fulfill the following searching scenarios: [Search -> Receive] "john.doe@gmail.com" -> 1,2 "john.doe@outlook.com" -> 2,4 "john@yahoo.com" -> 5 "john.doe" -> 1,2,3,4 "john" -> 1,2,3,4,5 "gmail.com" -> 1,2 "outlook.com" -> 2,3,4 The first three matchings is a MUST, and for the rest of them the more precise the better. Have already tried different combinations of index/search analyzers, tokenizers, and filters. Also tried to work on the condition for match queries, but did not find an ideal solution, any thought is welcome, and no limit to the mappings, analyzers, or which kind of query to use, thanks.

Mapping: <pre class="prettyprint"><code>PUT /test { "settings": { "analysis": { "filter": { "email": { "type": "pattern_capture", "preserve_original": 1, "patterns": [ "([^@]+)", "(\\p{L}+)", "(\\d+)", "@(.+)", "([^-@]+)" ] } }, "analyzer": { "email": { "tokenizer": "uax_url_email", "filter": [ "email", "lowercase", "unique" ] } } } }, "mappings": { "emails": { "properties": { "email": { "type": "string", "analyzer": "email" } } } } } </code></pre> Test data: <pre class="prettyprint"><code>POST /test/emails/_bulk {"index":{"_id":"1"}} {"email": "john.doe@gmail.com"} {"index":{"_id":"2"}} {"email": "john.doe@gmail.com, john.doe@outlook.com"} {"index":{"_id":"3"}} {"email": "hello-john.doe@outlook.com"} {"index":{"_id":"4"}} {"email": "john.doe@outlook.com"} {"index":{"_id":"5"}} {"email": "john@yahoo.com"} </code></pre> Query to be used: <pre class="prettyprint"><code>GET /test/emails/_search { "query": { "term": { "email": "john.doe@gmail.com" } } } </code></pre>

ElasticSearch Analyzer and Tokenizer for Emails

Tags:

email

lucene

elasticsearch

tokenize

analyzer

I could not find a perfect solution either in Google or ES for the following situation, hope someone could help here.

Suppose there are five email addresses stored under field "email":

1. {"email": "[email protected]"} 2. {"email": "[email protected], [email protected]"} 3. {"email": "[email protected]"} 4. {"email": "[email protected]} 5. {"email": "[email protected]"}

I want to fulfill the following searching scenarios:

[Search -> Receive]

"[email protected]" -> 1,2

"[email protected]" -> 2,4

"[email protected]" -> 5

"john.doe" -> 1,2,3,4

"john" -> 1,2,3,4,5

"gmail.com" -> 1,2

"outlook.com" -> 2,3,4

The first three matchings is a MUST, and for the rest of them the more precise the better. Have already tried different combinations of index/search analyzers, tokenizers, and filters. Also tried to work on the condition for match queries, but did not find an ideal solution, any thought is welcome, and no limit to the mappings, analyzers, or which kind of query to use, thanks.

560

asked May 08 '15 04:05

LYu

1 Answers

Mapping:

PUT /test {   "settings": {     "analysis": {       "filter": {         "email": {           "type": "pattern_capture",           "preserve_original": 1,           "patterns": [             "([^@]+)",             "(\\p{L}+)",             "(\\d+)",             "@(.+)",             "([^-@]+)"           ]         }       },       "analyzer": {         "email": {           "tokenizer": "uax_url_email",           "filter": [             "email",             "lowercase",             "unique"           ]         }       }     }   },   "mappings": {     "emails": {       "properties": {         "email": {           "type": "string",           "analyzer": "email"         }       }     }   } }

Test data:

POST /test/emails/_bulk {"index":{"_id":"1"}} {"email": "[email protected]"} {"index":{"_id":"2"}} {"email": "[email protected], [email protected]"} {"index":{"_id":"3"}} {"email": "[email protected]"} {"index":{"_id":"4"}} {"email": "[email protected]"} {"index":{"_id":"5"}} {"email": "[email protected]"}

Query to be used:

GET /test/emails/_search {   "query": {     "term": {       "email": "[email protected]"     }   } }

182

answered Sep 18 '22 13:09

Andrei Stefan

Related questions
                            
                                "The SMTP host was not specified." - but it is specified?
                            
                                Method for email testing
                            
                                What is the easiest way to mock an IMAP or POP server for unit tests? [duplicate]
                            
                                Is a "Confirm Email" input good practice when user changes email address?
                            
                                Which line break in php mail header, \r\n or \n?
                            
                                Subscriber email: GMail is converting height to min-height
                            
                                How to change sender name (not email address) when using the linux mail command for autosending mail? [closed]
                            
                                Send email with attachment in C++
                            
                                Laravel Mail to Log
                            
                                In PHP, how do I extract multiple e-mail addresses from a block of text and put them into an array?
                            
                                SMTP connect() failed PHPmailer - PHP
                            
                                Stream closed error when adding attachment to MailMessage
                            
                                Sending mails with attachment via NodeJS
                            
                                Change the sender name php mail instead of [email protected]
                            
                                Problem sending multipart mail using ActionMailer
                            
                                html button to send email
                            
                                Send email via gmail
                            
                                IMAP client in Java: JavaMail API or Apache Commons Net?
                            
                                What does request.user refer to in Django?
                            
                                In SMTP, must the RCPT TO: and TO: match?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With