Splitting a dataframe string column into multiple different columns [duplicate]

Tags:

What I am trying to accomplish is splitting a column into multiple columns. I would prefer the first column to contain "F", second column "US", third "CA6" or "DL", and the fourth to be "Z13" or "U13" etc etc. My entire df follows the same pattern of X.XX.XXXX.XXX or X.XX.XXX.XXX or X.XX.XX.XXX and I know the third column is where my problem lies because of the different lengths. I have only used substr in the past and I could use that here with some if statements but would like to learn how to use stringr package and POSIX to do this (unless there is a better option). Thank you in advance.

Here is my df:

c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13" )

663

asked Sep 05 '13 16:09

Tim

2 Answers

A very direct way is to just use read.table on your character vector:

> read.table(text = text, sep = ".", colClasses = "character")    V1 V2  V3  V4 1   F US CLE V13 2   F US CA6 U13 3   F US CA6 U13 4   F US CA6 U13 5   F US CA6 U13 6   F US CA6 U13 7   F US CA6 U13 8   F US CA6 U13 9   F US  DL U13 10  F US  DL U13 11  F US  DL U13 12  F US  DL Z13 13  F US  DL Z13

colClasses needs to be specified, otherwise F gets converted to FALSE (which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )

Update (> a year later)...

Alternatively, you can use my cSplit function, like this:

cSplit(as.data.table(text), "text", ".") #     text_1 text_2 text_3 text_4 #  1:      F     US    CLE    V13 #  2:      F     US    CA6    U13 #  3:      F     US    CA6    U13 #  4:      F     US    CA6    U13 #  5:      F     US    CA6    U13 #  6:      F     US    CA6    U13 #  7:      F     US    CA6    U13 #  8:      F     US    CA6    U13 #  9:      F     US     DL    U13 # 10:      F     US     DL    U13 # 11:      F     US     DL    U13 # 12:      F     US     DL    Z13 # 13:      F     US     DL    Z13

Or, separate from "tidyr", like this:

library(dplyr) library(tidyr)  as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_")) #    V_1 V_2 V_3 V_4 # 1    F  US CLE V13 # 2    F  US CA6 U13 # 3    F  US CA6 U13 # 4    F  US CA6 U13 # 5    F  US CA6 U13 # 6    F  US CA6 U13 # 7    F  US CA6 U13 # 8    F  US CA6 U13 # 9    F  US  DL U13 # 10   F  US  DL U13 # 11   F  US  DL U13 # 12   F  US  DL Z13 # 13   F  US  DL Z13

141

answered Sep 25 '22 09:09

A5C1D2H2I1M1N2O1R2T1

Is this what you are trying to do?

# Our data text <- c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13" )  #  Split into individual elements by the '.' character #  Remember to escape it, because '.' by itself matches any single character elems <- unlist( strsplit( text , "\\." ) )  #  We know the dataframe should have 4 columns, so make a matrix m <- matrix( elems , ncol = 4 , byrow = TRUE )  #  Coerce to data.frame - head() is just to illustrate the top portion head( as.data.frame( m ) ) #  V1 V2  V3  V4 #1  F US CLE V13 #2  F US CA6 U13 #3  F US CA6 U13 #4  F US CA6 U13 #5  F US CA6 U13 #6  F US CA6 U13

answered Sep 26 '22 09:09

Simon O'Hanlon

Related questions
                            
                                Use expression with a variable r
                            
                                regex multiple pattern with singular replacement
                            
                                Adding a 3rd order polynomial and its equation to a ggplot in r
                            
                                How to get a barplot with several variables side by side grouped by a factor
                            
                                Split a string by any number of spaces
                            
                                Use input of purrr's map function to create a named list as output in R
                            
                                struggling with integers (maximum integer size)
                            
                                How does ggplot scale_continuous expand argument work?
                            
                                Extract non null elements from a list in R
                            
                                In R, using Ubuntu, try to install a lib depending on GMP C lib, it won't find GMP, but I have GMP installed
                            
                                Pandoc insert appendix after bibliography
                            
                                Converting data frame column from character to numeric
                            
                                cartesian product with dplyr R
                            
                                hiding personal functions in R
                            
                                Only download sources of a package and all dependencies
                            
                                Setting y axis breaks in ggplot
                            
                                dplyr left_join by less than, greater than condition
                            
                                Loop over rows of dataframe applying function with if-statement
                            
                                How can I get the average (mean) of selected columns
                            
                                percentage on y lab in a faceted ggplot barchart?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting a dataframe string column into multiple different columns [duplicate]

Tags:

split

dataframe

r

stringr

Tim

People also ask

2 Answers

Update (> a year later)...

A5C1D2H2I1M1N2O1R2T1

Simon O'Hanlon

Recent Activity

Donate For Us