How to extract the "domain" from an email address

Question

I have following pattern in my column

xyz@gmail.com
abc@hotmail.com

Now, I want to extract text after @ and before . i.e gmail and hotmail .I am able to extract text after . with following code.

sub(".*@", "", email)

How can I modify above to fit in my use case?

hrbrmstr · Accepted Answer

You:

really need to read Section 3 of RFC 3696 (TLDR: the @ can appear in multiple places)
seem to not have considered that an email can be "someone@department.example.com", "someone.else@yet.another.department.example.com" (i.e. naively assuming only a domain could come back to bite you at some point in this analysis)
should be aware that if you're really looking for the email "domain name" then you also have to consider what really constitutes a domain name and a proper suffix.

So — unless you know for sure that you have and always will have simple email addresses — might I suggest:

library(stringi)
library(urltools)
library(dplyr)
library(purrr)

emails <- c("yz@gmail.com", "abc@hotmail.com",
            "someone@department.example.com",
            "someone.else@yet.another.department.com",
            "some.brit@froodyorg.co.uk")

stri_locate_last_fixed(emails, "@")[,"end"] %>%
  map2_df(emails, function(x, y) {
    substr(y, x+1, nchar(y)) %>%
      suffix_extract()
  })
##                         host    subdomain      domain suffix
## 1                  gmail.com         <NA>       gmail    com
## 2                hotmail.com         <NA>     hotmail    com
## 3      deparment.example.com   department     example    com
## 4 yet.another.department.com  yet.another  department    com
## 5             froodyco.co.uk         <NA>   froodyorg  co.uk

Note the proper splitting of subdomain, domain & suffix, especially for the last one.

Knowing this, we can then change the code to:

stri_locate_last_fixed(emails, "@")[,"end"] %>%
  map2_chr(emails, function(x, y) {
    substr(y, x+1, nchar(y)) %>%
      suffix_extract() %>%
      mutate(full_domain=ifelse(is.na(subdomain), domain, sprintf("%s.%s", subdomain, domain))) %>%
      select(full_domain) %>%
      flatten_chr()
  })
## [1] "gmail"                   "hotmail"               
## [3] "department.example"      "yet.another.department"
## [5] "froodyorg"

akrun · Answer

We can use gsub

gsub(".*@|\..*", "", email)
#[1] "gmail"   "hotmail"

How to extract the "domain" from an email address

Tags:

regex

r

Neil

2 Answers

hrbrmstr

akrun

Recent Activity

Donate For Us

How to extract the "domain" from an email address

Tags:

regex

r

Neil

2 Answers

hrbrmstr

akrun

Related questions

Recent Activity

Donate For Us