What I am trying to accomplish is splitting a column into multiple columns. I would prefer the first column to contain "F", second column "US", third "CA6" or "DL", and the fourth to be "Z13" or "U13" etc etc. My entire df follows the same pattern of X.XX.XXXX.XXX or X.XX.XXX.XXX or X.XX.XX.XXX and I know the third column is where my problem lies because of the different lengths. I have only used substr in the past and I could use that here with some if statements but would like to learn how to use stringr package and POSIX to do this (unless there is a better option). Thank you in advance.
Here is my df:
c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13" )
Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
We can use str. split() to split one column to multiple columns by specifying expand=True option.
A very direct way is to just use read.table
on your character vector:
> read.table(text = text, sep = ".", colClasses = "character") V1 V2 V3 V4 1 F US CLE V13 2 F US CA6 U13 3 F US CA6 U13 4 F US CA6 U13 5 F US CA6 U13 6 F US CA6 U13 7 F US CA6 U13 8 F US CA6 U13 9 F US DL U13 10 F US DL U13 11 F US DL U13 12 F US DL Z13 13 F US DL Z13
colClasses
needs to be specified, otherwise F
gets converted to FALSE
(which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )
Alternatively, you can use my cSplit
function, like this:
cSplit(as.data.table(text), "text", ".") # text_1 text_2 text_3 text_4 # 1: F US CLE V13 # 2: F US CA6 U13 # 3: F US CA6 U13 # 4: F US CA6 U13 # 5: F US CA6 U13 # 6: F US CA6 U13 # 7: F US CA6 U13 # 8: F US CA6 U13 # 9: F US DL U13 # 10: F US DL U13 # 11: F US DL U13 # 12: F US DL Z13 # 13: F US DL Z13
Or, separate
from "tidyr", like this:
library(dplyr) library(tidyr) as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_")) # V_1 V_2 V_3 V_4 # 1 F US CLE V13 # 2 F US CA6 U13 # 3 F US CA6 U13 # 4 F US CA6 U13 # 5 F US CA6 U13 # 6 F US CA6 U13 # 7 F US CA6 U13 # 8 F US CA6 U13 # 9 F US DL U13 # 10 F US DL U13 # 11 F US DL U13 # 12 F US DL Z13 # 13 F US DL Z13
Is this what you are trying to do?
# Our data text <- c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13" ) # Split into individual elements by the '.' character # Remember to escape it, because '.' by itself matches any single character elems <- unlist( strsplit( text , "\\." ) ) # We know the dataframe should have 4 columns, so make a matrix m <- matrix( elems , ncol = 4 , byrow = TRUE ) # Coerce to data.frame - head() is just to illustrate the top portion head( as.data.frame( m ) ) # V1 V2 V3 V4 #1 F US CLE V13 #2 F US CA6 U13 #3 F US CA6 U13 #4 F US CA6 U13 #5 F US CA6 U13 #6 F US CA6 U13
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With