Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting values of a string with str_detect

Tags:

string

r

detect

I currently have a data.frame (X) with the following structure:

Number Observation
1   34
2   Example
3   Example34% 
4   Example
5   34

My desired output is 2 data frames, one which contains only the double observations (i.e 34) and one which contains everything else (Characters and Characters with numbers and %).

I have been able to obtain the number observations using:

y <- x[str_detect(x$Observation,("([0-9])")),]

But it also includes observation with characters and numbers. When I negate it !str_detect(...) i only get a character output leaving out Example34%. Is there a way to str_detect only number values and then !that to obtain everything else?

Example of desired output: enter image description here

like image 706
k3r0 Avatar asked Mar 02 '23 09:03

k3r0


2 Answers

Using anchors for the start ^ and end $ of the regex

library(tidyverse)

data_example <- tibble::tribble(
  ~Number, ~Observation,
  1L, "34",
  2L, "Example",
  3L, "Example34%",
  4L, "Example",
  5L, "34"
)

tidy_solution <- data_example %>%
  mutate(
    just_numbers = Observation %>% str_extract("^[:digit:]+$"),
    just_not_numbers = if_else(just_numbers %>% is.na(), Observation, NA_character_),
    full_ans = coalesce(just_numbers, just_not_numbers)
  )

tidy_solution
#> # A tibble: 5 x 5
#>   Number Observation just_numbers just_not_numbers full_ans  
#>    <int> <chr>       <chr>        <chr>            <chr>     
#> 1      1 34          34           <NA>             34        
#> 2      2 Example     <NA>         Example          Example   
#> 3      3 Example34%  <NA>         Example34%       Example34%
#> 4      4 Example     <NA>         Example          Example   
#> 5      5 34          34           <NA>             34

a <- tidy_solution %>%
  select(Number, just_numbers) %>%
  na.omit()

a
#> # A tibble: 2 x 2
#>   Number just_numbers
#>    <int> <chr>       
#> 1      1 34          
#> 2      5 34


b <- tidy_solution %>%
  select(Number, just_not_numbers) %>%
  na.omit()

Created on 2020-06-10 by the reprex package (v0.3.0)

like image 170
Bruno Avatar answered Mar 05 '23 00:03

Bruno


A way would be to find one of the output and use anti_join to get another one.

library(dplyr)
library(stringr)

df1 <- df %>% filter(str_detect(Observation, '[A-Za-z]'))
df2 <- anti_join(df, df1)

df1
#  Number Observation
#1      2     Example
#2      3  Example34%
#3      4     Example

df2
#  Number Observation
#1      1          34
#2      5          34

In df1 we include rows that have any alphabet and df2 is everything else.

data

df <- structure(list(Number = 1:5, Observation = c("34", "Example", 
"Example34%", "Example", "34")), class = "data.frame", row.names=c(NA, -5L))
like image 43
Ronak Shah Avatar answered Mar 04 '23 23:03

Ronak Shah