I am trying to split a column into multiple columns based on comma/space separation.
My dataframe currently looks like
KEYS 1 0 FIT-4270 4000.0439 1 FIT-4269 4000.0420, 4000.0471 2 FIT-4268 4000.0419 3 FIT-4266 4000.0499 4 FIT-4265 4000.0490, 4000.0499, 4000.0500, 4000.0504,
I would like
KEYS 1 2 3 4 0 FIT-4270 4000.0439 1 FIT-4269 4000.0420 4000.0471 2 FIT-4268 4000.0419 3 FIT-4266 4000.0499 4 FIT-4265 4000.0490 4000.0499 4000.0500 4000.0504
My code currently removes The KEYS column and I'm not sure why. Could anyone improve or help fix the issue?
v = dfcleancsv[1] #splits the columns by spaces into new columns but removes KEYS? dfcleancsv = dfcleancsv[1].str.split(' ').apply(Series, 1)
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
To split text in a column into multiple rows with Python Pandas, we can use the str. split method. to create the df data frame. Then we call str.
Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.
In case someone else wants to split a single column (deliminated by a value) into multiple columns - try this:
series.str.split(',', expand=True)
This answered the question I came here looking for.
Credit to EdChum's code that includes adding the split columns back to the dataframe.
pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)
Note: The first argument df[[0]]
is DataFrame
.
The second argument df[1].str.split
is the series that you want to split.
split Documentation
concat Documentation
Using Edchums answer of
pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)
I was able to solve it by substituting my variables.
dfcleancsv = pd.concat([dfcleancsv['KEYS'], dfcleancsv[1].str.split(', ', expand=True)], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With