I have a data set:
crimes<-data.frame(x=c("Smith", "Jones"), charges=c("murder, first degree-G, manslaughter-NG", "assault-NG, larceny, second degree-G"))
I'm using tidyr:separate to split the charges column on a match with "G,"
crimes<-separate(crimes, charges, into=c("v1","v2"), sep="G,")
This splits my columns, but removes the separator "G,". I want to retain the "G," in the resulting column split.
My desired output is:
x v1 v2
Smith murder, first degree-G manslaughter-NG
Jones assault-NG larceny, second degree-G
Any suggestions welcome.
separate() turns a single character column into multiple columns by splitting the values of the column wherever a separator character appears.
To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.
Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.
To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.
The data frame contains just single column of file names. How to Split a Single Column into Multiple Columns with tidyr’ separate ()? Let us use separate function from tidyr to split the “file_name” column into multiple columns with specific column name. Here, we will specify the column names in a vector.
By default, separate uses regular expression that matches any sequence of non-alphanumeric values as delimiter to split. In this example, tidyr automatically found that the delimiters are underscore and dot and separted the single column to four columns with the names specified. Often you want only part of text in a column.
Use strsplit () function with delimiter in R A delimiter in programming is a symbol or a special character or value that separates the words or text in the data. Let’s use the & character as a delimiter and split the string from that character. rs <- ("This&is&First&R&String&Example") strsplit (rs, split = "&")
How to separate string and a numeric value in R? To separate string and a numeric value, we can use strplit function and split the values by passing all type of characters and all the numeric values.
Replace <yourRegexPattern>
with your Regex
If you want the 'sep' in the left column (look behind)
dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?<=<yourRegexPattern>)")
If you want the 'sep' in the right column (look ahead)
dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?=<yourRegexPattern>)")
Also note that when you are trying to separate a word from a group of digits (I.E. Auguest1990
to August
and 1990
) you will need to ensure the whole pattern gets read.
Example:
dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?=[[:digit:]])", extra="merge")
UPDATE
This is what you ask for. Keep in mind that your data is not tidy (both V1 and V2 have more than one variable inside each column)
A<-separate(crimes,charges,into=c("V1","V2"),sep = "(?<=G,)")
A
x V1 V2
1 Smith murder, first degree-G, manslaughter-NG
2 Jones assault-NG, larceny, second degree-G
An easier way to get keep the "G" or "NG" is to use sep=", "
as said by alistaire.
A<-separate(crimes, charges, into=c("v1","v2"), sep = ', ')
This gives
x v1 v2
1 Smith murder-G manslaughter-NG
2 Jones assault-NG larceny-G
If you wanted to keep separating your data.frame (using the -)
separate(A, v1, into = c("v3","v4"), sep = "-")
that gives
x v3 v4 v2
1 Smith murder G manslaughter-NG
2 Jones assault NG larceny-G
You'll need to do that again for the v2 column. I don't know if you want to keep separating, please post your expected output to make my answer more specific.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With