Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a dataframe string column into multiple different columns [duplicate]

What I am trying to accomplish is splitting a column into multiple columns. I would prefer the first column to contain "F", second column "US", third "CA6" or "DL", and the fourth to be "Z13" or "U13" etc etc. My entire df follows the same pattern of X.XX.XXXX.XXX or X.XX.XXX.XXX or X.XX.XX.XXX and I know the third column is where my problem lies because of the different lengths. I have only used substr in the past and I could use that here with some if statements but would like to learn how to use stringr package and POSIX to do this (unless there is a better option). Thank you in advance.

Here is my df:

c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13" ) 
like image 663
Tim Avatar asked Sep 05 '13 16:09

Tim


People also ask

How do I split a column into a Dataframe in multiple columns?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How do you split items into multiple columns in a data frame?

split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.

How do you split data in one column into multiple columns in Python?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

How do I convert one column to multiple columns in pandas?

We can use str. split() to split one column to multiple columns by specifying expand=True option.


2 Answers

A very direct way is to just use read.table on your character vector:

> read.table(text = text, sep = ".", colClasses = "character")    V1 V2  V3  V4 1   F US CLE V13 2   F US CA6 U13 3   F US CA6 U13 4   F US CA6 U13 5   F US CA6 U13 6   F US CA6 U13 7   F US CA6 U13 8   F US CA6 U13 9   F US  DL U13 10  F US  DL U13 11  F US  DL U13 12  F US  DL Z13 13  F US  DL Z13 

colClasses needs to be specified, otherwise F gets converted to FALSE (which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )


Update (> a year later)...

Alternatively, you can use my cSplit function, like this:

cSplit(as.data.table(text), "text", ".") #     text_1 text_2 text_3 text_4 #  1:      F     US    CLE    V13 #  2:      F     US    CA6    U13 #  3:      F     US    CA6    U13 #  4:      F     US    CA6    U13 #  5:      F     US    CA6    U13 #  6:      F     US    CA6    U13 #  7:      F     US    CA6    U13 #  8:      F     US    CA6    U13 #  9:      F     US     DL    U13 # 10:      F     US     DL    U13 # 11:      F     US     DL    U13 # 12:      F     US     DL    Z13 # 13:      F     US     DL    Z13 

Or, separate from "tidyr", like this:

library(dplyr) library(tidyr)  as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_")) #    V_1 V_2 V_3 V_4 # 1    F  US CLE V13 # 2    F  US CA6 U13 # 3    F  US CA6 U13 # 4    F  US CA6 U13 # 5    F  US CA6 U13 # 6    F  US CA6 U13 # 7    F  US CA6 U13 # 8    F  US CA6 U13 # 9    F  US  DL U13 # 10   F  US  DL U13 # 11   F  US  DL U13 # 12   F  US  DL Z13 # 13   F  US  DL Z13 
like image 141
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 25 '22 09:09

A5C1D2H2I1M1N2O1R2T1


Is this what you are trying to do?

# Our data text <- c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",  "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13" )  #  Split into individual elements by the '.' character #  Remember to escape it, because '.' by itself matches any single character elems <- unlist( strsplit( text , "\\." ) )  #  We know the dataframe should have 4 columns, so make a matrix m <- matrix( elems , ncol = 4 , byrow = TRUE )  #  Coerce to data.frame - head() is just to illustrate the top portion head( as.data.frame( m ) ) #  V1 V2  V3  V4 #1  F US CLE V13 #2  F US CA6 U13 #3  F US CA6 U13 #4  F US CA6 U13 #5  F US CA6 U13 #6  F US CA6 U13 
like image 33
Simon O'Hanlon Avatar answered Sep 26 '22 09:09

Simon O'Hanlon