Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R gsub everything after blank

Tags:

regex

r

gsub

I am stuggling to figure out how to gsub everything after the "blank" of the first hour value.

as.data.frame(valeur)

         valeur
1    8:01 8:15 
2  17:46 18:00 
3          <NA>
4          <NA>
5          <NA>
6          <NA>
7    8:01 8:15 
8  17:46 18:00 

What I need is

     valeur
1          8:01
2         17:46
3          <NA>
4          <NA>
5          <NA>
6          <NA>
7          8:01
8         17:46

Any clue ?

I tried

 gsub("[:blank:].*$","",valeur)

Almost

valeur = c(" 8:01 8:15 ", " 17:46 18:00 ", NA, NA, NA, NA, " 8:01 8:15 ", 
" 17:46 18:00 ")
like image 298
giac Avatar asked Aug 22 '15 13:08

giac


1 Answers

I guess you have leading/lagging spaces from the 'valeur' output. We can remove those with gsub. We match one or more space from the beginning of the string (^\\s+) or (|) space at the end of the string (\\s+$), replace with ''.

valeur1 <- gsub('^\\s+|\\s+$', '', valeur)

If we need the first non-space characters, we match the space (\\s+) followed by non-space (\\S+) till the end of the string and replace with ''.

sub('\\s+\\S+$', '', valeur1)
#[1] "8:01"  "17:46" NA      NA      NA      NA      "8:01"  "17:46"

To get the last non-space characters, use sub to match one or more characters that are not a space (\\S+) from the beginning of the string ('^') followed by one or more space (\\s+) and replace it with '' to get the last non-space character.

sub('^\\S+\\s+', '', valeur1)
#[1] "8:15"  "18:00" NA      NA      NA      NA      "8:15"  "18:00"

The above can be done in a single step where we match zero or more space at the beginning (^\\s*) or (|) a one or more space (\\s+) followed by one or more non-space characters (\\S+), followed by zero or more space characters at the end (\\s*$) and replace by ''.

 gsub("^\\s*|\\s+\\S+\\s*$","",valeur)
 #[1] "8:01"  "17:46" NA      NA      NA      NA      "8:01"  "17:46"

Or another option is stri_extract_first or stri_extract_last from library(stringi) where we match one or more non-space characters at the beginning or the end.

 library(stringi)
 stri_extract_first(valeur, regex='\\S+')
 #[1] "8:01"  "17:46" NA      NA      NA      NA      "8:01"  "17:46"

For the last non_space characters

 stri_extract_last(valeur, regex='\\S+')
 #[1] "8:15"  "18:00" NA      NA      NA      NA      "8:15"  "18:00"
like image 163
akrun Avatar answered Oct 05 '22 11:10

akrun