I read up on regular expressions and Hadley Wickham's stringr
and dplyr
packages but can't figure out how to get this to work.
I have library circulation data in a data frame, with the call number as a character variable. I'd like to take the initial capital letters and make that a new variable and the digits between the letters and period into a second new variable.
Call_Num
HV5822.H4 C47 Circulating Collection, 3rd Floor
QE511.4 .G53 1982 Circulating Collection, 3rd Floor
TL515 .M63 Circulating Collection, 3rd Floor
D753 .F4 Circulating Collection, 3rd Floor
DB89.F7 D4 Circulating Collection, 3rd Floor
To create new variables from existing variables, use the case when() function from the dplyr package in R.
To create a new variable choose a name for the new variable, use a data step, and then define it based on already existing variables using the equals sign (=). run; The data set "w" has three variables, height, weight, and bmi.
The condition would be ((var1 = 1) & (var2 = 1)) for example. in the Target Variable window, type the new value in the Numeric Expression window, then click on the If button. You can specify the condition (such as GENDER=1 or RACE=1 AND REGION=3) and then click on the Continue Button.
To compute a new variable, click Transform > Compute Variable. The Compute Variable window will open where you will specify how to calculate your new variable. A Target Variable: The name of the new variable that will be created during the computation. Simply type a name for the new variable in the text field.
Using the stringi
package, this would be one option. Since your target stays at the beginning of the strings, stri_extract_first()
would work pretty well. [:alpha:]{1,}
indicates alphabet sequences which contain more than one alphabet. With stri_extract_first()
, you can identify the first alphabet sequence. Likewise, you can find the first sequence of numbers with stri_extract_first(x, regex = "\\d{1,}")
.
x <- c("HV5822.H4 C47 Circulating Collection, 3rd Floor",
"QE511.4 .G53 1982 Circulating Collection, 3rd Floor",
"TL515 .M63 Circulating Collection, 3rd Floor",
"D753 .F4 Circulating Collection, 3rd Floor",
"DB89.F7 D4 Circulating Collection, 3rd Floor")
library(stringi)
data.frame(alpha = stri_extract_first(x, regex = "[:alpha:]{1,}"),
number = stri_extract_first(x, regex = "\\d{1,}"))
# alpha number
#1 HV 5822
#2 QE 511
#3 TL 515
#4 D 753
#5 DB 89
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With