I have a Pandas DataFrame with one column:
import pandas as pd
df = pd.DataFrame({"teams": [["SF", "NYG"] for _ in range(7)]})
       teams
0  [SF, NYG]
1  [SF, NYG]
2  [SF, NYG]
3  [SF, NYG]
4  [SF, NYG]
5  [SF, NYG]
6  [SF, NYG]
How can split this column of lists into two columns?
Desired result:
  team1 team2
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG
                To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.
Use the str. split() Function to Split Strings Into Two List/Columns in Python Pandas. The string can be saved as a series list or constructed from a single, separated string, multiple column dataframes. Functions used are similar to Python's default split() method, but they can only be applied to a single string.
If you wanted to split a column of delimited strings rather than lists, you could similarly do: df["teams"]. str. split('<delim>', expand=True) already returns a DataFrame, so it would probably be simpler to just rename the columns.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
You can use the DataFrame constructor with lists created by to_list:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
       teams
0  [SF, NYG]
1  [SF, NYG]
2  [SF, NYG]
3  [SF, NYG]
4  [SF, NYG]
5  [SF, NYG]
6  [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.tolist(), index= df2.index)
print (df2)
       teams team1 team2
0  [SF, NYG]    SF   NYG
1  [SF, NYG]    SF   NYG
2  [SF, NYG]    SF   NYG
3  [SF, NYG]    SF   NYG
4  [SF, NYG]    SF   NYG
5  [SF, NYG]    SF   NYG
6  [SF, NYG]    SF   NYG
And for a new DataFrame:
df3 = pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
print (df3)
  team1 team2
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG
A solution with apply(pd.Series) is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [121]: %timeit df2['teams'].apply(pd.Series)
1.79 s ± 52.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [122]: %timeit pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
1.63 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
                        Much simpler solution:
pd.DataFrame(df2["teams"].to_list(), columns=['team1', 'team2'])
Yields,
  team1 team2
-------------
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG
7    SF   NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df["teams"].str.split('<delim>', expand=True).values,
             columns=['team1', 'team2'])
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With