Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create new variables based upon specific values

I read up on regular expressions and Hadley Wickham's stringr and dplyr packages but can't figure out how to get this to work.

I have library circulation data in a data frame, with the call number as a character variable. I'd like to take the initial capital letters and make that a new variable and the digits between the letters and period into a second new variable.

Call_Num
HV5822.H4 C47 Circulating Collection, 3rd Floor
QE511.4 .G53 1982 Circulating Collection, 3rd Floor
TL515 .M63 Circulating Collection, 3rd Floor
D753 .F4 Circulating Collection, 3rd Floor
DB89.F7 D4 Circulating Collection, 3rd Floor 
like image 352
Concept Delta Avatar asked Jul 07 '15 04:07

Concept Delta


People also ask

How do I create a new variable in R based on other variables?

To create new variables from existing variables, use the case when() function from the dplyr package in R.

How do you create a new variable from an existing variable?

To create a new variable choose a name for the new variable, use a data step, and then define it based on already existing variables using the equals sign (=). run; The data set "w" has three variables, height, weight, and bmi.

How do I calculate a new variable based on values of other variables SPSS?

The condition would be ((var1 = 1) & (var2 = 1)) for example. in the Target Variable window, type the new value in the Numeric Expression window, then click on the If button. You can specify the condition (such as GENDER=1 or RACE=1 AND REGION=3) and then click on the Continue Button.

How do you create a new variable from existing ones SPSS?

To compute a new variable, click Transform > Compute Variable. The Compute Variable window will open where you will specify how to calculate your new variable. A Target Variable: The name of the new variable that will be created during the computation. Simply type a name for the new variable in the text field.


1 Answers

Using the stringi package, this would be one option. Since your target stays at the beginning of the strings, stri_extract_first() would work pretty well. [:alpha:]{1,} indicates alphabet sequences which contain more than one alphabet. With stri_extract_first(), you can identify the first alphabet sequence. Likewise, you can find the first sequence of numbers with stri_extract_first(x, regex = "\\d{1,}").

x <- c("HV5822.H4 C47 Circulating Collection, 3rd Floor",
       "QE511.4 .G53 1982 Circulating Collection, 3rd Floor",
       "TL515 .M63 Circulating Collection, 3rd Floor",
       "D753 .F4 Circulating Collection, 3rd Floor",
       "DB89.F7 D4 Circulating Collection, 3rd Floor")

library(stringi)

data.frame(alpha = stri_extract_first(x, regex = "[:alpha:]{1,}"), 
           number = stri_extract_first(x, regex = "\\d{1,}"))

#  alpha number
#1    HV   5822
#2    QE    511
#3    TL    515
#4     D    753
#5    DB     89
like image 128
jazzurro Avatar answered Nov 15 '22 05:11

jazzurro