I got the following dataframe which is created dynamically but saves all the relevant values into one row:
df
+----------------+----------------+----------------+----------------+
| 1 | 2 | 3 | 4 |
+----------------+----------------+----------------+----------------+
| a1, b1, c1, d1 | a2, b2, c2, d2 | a3, b3, c3, d3 | a4, b4, c4, d4 |
+----------------+----------------+----------------+----------------+
I need to have all a_i values in one row, all b's etc (the columns are defined and constant):
+----+----+----+----+
| 1 | 2 | 3 | 4 |
+----+----+----+----+
| a1 | a2 | a3 | a4 |
| b1 | b2 | b3 | b4 |
| c1 | c2 | c3 | c4 |
| d1 | d2 | d3 | d4 |
+----+----+----+----+
Due to the circumstances that the number of different letters in df is changing from case to case I would need a dynamic solution which converts the df into the form above.
To split cell into multiple rows in a Python Pandas dataframe, we can use the apply method. to call apply with a lambda function that calls str. split to split the x string value. And then we call explode to fill new rows with the split values.
In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.
Series and DataFrame methods define a . explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column. Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.
df.explode(df.columns.tolist())
Output:
1 2 3 4
0 a1 a2 a3 a4
0 b1 b2 b3 b4
0 c1 c2 c3 c4
0 d1 d2 d3 d4
Given, df with this structure:
df = pd.DataFrame({1:[np.array('a1 b1 c1 d1'.split(' '))],
2:[np.array('a2 b2 c2 d2'.split(' '))],
3:[np.array('a3 b3 c3 d3'.split(' '))],
4:[np.array('a4 b4 c4 d4'.split(' '))]})
Input dataframe:
1 2 3 4
0 [a1, b1, c1, d1] [a2, b2, c2, d2] [a3, b3, c3, d3] [a4, b4, c4, d4]
You can use pd.Series.explode:
df.apply(pd.Series.explode)
Output:
1 2 3 4
0 a1 a2 a3 a4
0 b1 b2 b3 b4
0 c1 c2 c3 c4
0 d1 d2 d3 d4
Somewhat similar to Scott Boston's answer, but much faster (apply
is notoriously slow):
pd.DataFrame(df.values[0].tolist(), columns=df.columns)
# 1 2 3 4
#0 a1 b1 c1 d1
#1 a2 b2 c2 d2
#2 a3 b3 c3 d3
#3 a4 b4 c4 d4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With