How to split a python dataframe based on new line characters?

Question

I have pandas dataframe in which a column contains paragraphs of text. I wanted to explode the dataframe into separate columns by splitting the paragraphs of text into newlines. The paragraph of text may contain multiple new lines.

Example dataframe:

Current output:
A
foo bar
foo bar
foo bar
foo bar
foo bar

Desired output:

   A         B                                                      
0 foo bar                                                  
1 foo bar   foo bar                                                 
2 foo bar                                                  
3 foo bar

I have tried using this:

df.A.str.split(expand=True))

But it is splitting at every whitespace not "/n" as expected.

Drumstick · Accepted Answer

As stated in the docs you should be able to specify the delimiter to split on as the (optional) parameter of the split method par, otherwise it will split on whitespaces only:

"String or regular expression to split on. If not specified, split on whitespace."

Therefore you may do the following to achive the newline-splitting feature:

df.A.str.split(pat="
", expand=True)

Arne · Answer

You have to pass the pattern on which to split the string as an argument to series.str.split(). Here is a complete reproducible example that works on Windows systems:

import pandas as pd

df = pd.DataFrame({'A': ['foo bar', 
                         'foo bar
foo bar',
                         'foo bar',
                         'foo bar']})

df.A.str.split(pat='
', expand=True)

    0           1
0   foo bar     None
1   foo bar     foo bar
2   foo bar     None
3   foo bar     None

For a platform-independent solution, I would do something similar to @ThePyGuy's answer, but with str.splitlines(), because this method will recognize line boundaries from various systems.

df.A.apply(str.splitlines).apply(pd.Series).fillna('')

How to split a python dataframe based on new line characters?

Tags:

python

pandas

dataframe

Adam Choy

2 Answers

Drumstick

Arne

Recent Activity

Donate For Us

How to split a python dataframe based on new line characters?

Tags:

python

pandas

dataframe

Adam Choy

2 Answers

Drumstick

Arne

Related questions

Recent Activity

Donate For Us