Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions in R to erase all characters after the first space?

Tags:

regex

r

I have data in R that can look like this:

USDZAR Curncy
R157 Govt
SPX Index

In other words, one word, in this case a Bloomberg security identifier, followed by another word, which is the security class, separated by a space. I want to strip out the class and the space to get to:

USDZAR
R157
SPX

What's the most efficient way of doing this in R? Is it regular expressions or must I do something as I would in MS Excel using the mid and find commands? eg in Excel I would say:

=MID(@REF, 1, FIND(" ", @REF, 1)-1)

which means return a substring starting at character 1, and ending at the character number of the first space (less 1 to erase the actual space).

Do I need to do something similar in R (in which case, what is the equivalent), or can regular expressions help here? Thanks.

like image 944
Thomas Browne Avatar asked Jun 04 '11 23:06

Thomas Browne


2 Answers

1) Try this where the regular expression matches a space followed by any sequence of characters and sub replaces that with a string having zero characters:

x <- c("USDZAR Curncy", "R157 Govt", "SPX Index")
sub(" .*", "", x)
## [1] "USDZAR" "R157"   "SPX"  

2) An alternative if you wanted the two words in separate columns in a data frame is as follows. Here as.is = TRUE makes the columns be character rather than factor.

read.table(text = x, as.is = TRUE)
##       V1     V2
## 1 USDZAR Curncy
## 2   R157   Govt
## 3    SPX  Index
like image 113
G. Grothendieck Avatar answered Nov 16 '22 03:11

G. Grothendieck


It's pretty easy with stringr:

x <- c("USDZAR Curncy", "R157 Govt", "SPX Index")

library(stringr)
str_split_fixed(x, " ", n = 2)[, 1]
like image 27
hadley Avatar answered Nov 16 '22 02:11

hadley