I have pandas dataframe in which a column contains paragraphs of text. I wanted to explode the dataframe into separate columns by splitting the paragraphs of text into newlines. The paragraph of text may contain multiple new lines.
Example dataframe:
Current output:
A
foo bar
foo bar\nfoo bar
foo bar
foo bar
Desired output:
A B
0 foo bar
1 foo bar foo bar
2 foo bar
3 foo bar
I have tried using this:
df.A.str.split(expand=True))
But it is splitting at every whitespace not "/n" as expected.
As stated in the docs you should be able to specify the delimiter to split on as the (optional) parameter of the split
method par
, otherwise it will split on whitespaces only:
"String or regular expression to split on. If not specified, split on whitespace."
Therefore you may do the following to achive the newline-splitting feature:
df.A.str.split(pat="\n", expand=True)
You have to pass the pattern on which to split the string as an argument to series.str.split()
. Here is a complete reproducible example that works on Windows systems:
import pandas as pd
df = pd.DataFrame({'A': ['foo bar',
'foo bar\nfoo bar',
'foo bar',
'foo bar']})
df.A.str.split(pat='\n', expand=True)
0 1
0 foo bar None
1 foo bar foo bar
2 foo bar None
3 foo bar None
For a platform-independent solution, I would do something similar to @ThePyGuy's answer, but with str.splitlines()
, because this method will recognize line boundaries from various systems.
df.A.apply(str.splitlines).apply(pd.Series).fillna('')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With