I got the following dataframe which is created dynamically but saves all the relevant values into one row: df <pre class="prettyprint"><code>+----------------+----------------+----------------+----------------+ | 1 | 2 | 3 | 4 | +----------------+----------------+----------------+----------------+ | a1, b1, c1, d1 | a2, b2, c2, d2 | a3, b3, c3, d3 | a4, b4, c4, d4 | +----------------+----------------+----------------+----------------+ </code></pre> I need to have all a_i values in one row, all b's etc (the columns are defined and constant): <pre class="prettyprint"><code>+----+----+----+----+ | 1 | 2 | 3 | 4 | +----+----+----+----+ | a1 | a2 | a3 | a4 | | b1 | b2 | b3 | b4 | | c1 | c2 | c3 | c4 | | d1 | d2 | d3 | d4 | +----+----+----+----+ </code></pre> Due to the circumstances that the number of different letters in df is changing from case to case I would need a dynamic solution which converts the df into the form above.

Somewhat similar to Scott Boston's answer, but much faster (<code>apply</code> is notoriously slow): <pre class="prettyprint"><code>pd.DataFrame(df.values[0].tolist(), columns=df.columns) # 1 2 3 4 #0 a1 b1 c1 d1 #1 a2 b2 c2 d2 #2 a3 b3 c3 d3 #3 a4 b4 c4 d4 </code></pre>

Split dataframe with all values in one row

Tags:

python

pandas

dataframe

I got the following dataframe which is created dynamically but saves all the relevant values into one row:

+----------------+----------------+----------------+----------------+
|       1        |       2        |       3        |       4        |
+----------------+----------------+----------------+----------------+
| a1, b1, c1, d1 | a2, b2, c2, d2 | a3, b3, c3, d3 | a4, b4, c4, d4 |
+----------------+----------------+----------------+----------------+

I need to have all a_i values in one row, all b's etc (the columns are defined and constant):

+----+----+----+----+
| 1  | 2  | 3  | 4  |
+----+----+----+----+
| a1 | a2 | a3 | a4 |
| b1 | b2 | b3 | b4 |
| c1 | c2 | c3 | c4 |
| d1 | d2 | d3 | d4 |
+----+----+----+----+

Due to the circumstances that the number of different letters in df is changing from case to case I would need a dynamic solution which converts the df into the form above.

502

asked Oct 17 '20 21:10

TimSqua

2 Answers

Update Pandas 1.3.0 explode accepts a list of column headers.

df.explode(df.columns.tolist())

Output:

    1   2   3   4
0  a1  a2  a3  a4
0  b1  b2  b3  b4
0  c1  c2  c3  c4
0  d1  d2  d3  d4

Given, df with this structure:

df = pd.DataFrame({1:[np.array('a1 b1 c1 d1'.split(' '))],
                  2:[np.array('a2 b2 c2 d2'.split(' '))],
                  3:[np.array('a3 b3 c3 d3'.split(' '))],
                  4:[np.array('a4 b4 c4 d4'.split(' '))]})

Input dataframe:

                  1                 2                 3                 4
0  [a1, b1, c1, d1]  [a2, b2, c2, d2]  [a3, b3, c3, d3]  [a4, b4, c4, d4]

You can use pd.Series.explode:

df.apply(pd.Series.explode)

Output:

    1   2   3   4
0  a1  a2  a3  a4
0  b1  b2  b3  b4
0  c1  c2  c3  c4
0  d1  d2  d3  d4

answered Nov 02 '22 22:11

Scott Boston

Somewhat similar to Scott Boston's answer, but much faster (apply is notoriously slow):

pd.DataFrame(df.values[0].tolist(), columns=df.columns)
#    1   2   3   4
#0  a1  b1  c1  d1
#1  a2  b2  c2  d2
#2  a3  b3  c3  d3
#3  a4  b4  c4  d4

answered Nov 02 '22 23:11

DYZ

Related questions
                            
                                SQLAlchemy error: "TypeError: Additional arguments should be named <dialectname>_<argument>, got 'nullable'"
                            
                                Can't import module installed with pip (anaconda python)
                            
                                Cannot install mysqlclient
                            
                                PyTorch and TensorFlow object detection - evaluate - object of type <class 'numpy.float64'> cannot be safely interpreted as an integer
                            
                                How to add variable type annotation for what goes into a Queue?
                            
                                OpenCV giving an error whenever import cv2 is used
                            
                                I installed matplotlib via pip but when I try to import matplotlib to PyCharm I get an error
                            
                                Pytest Flask, error 308 Permanent Redirect when login
                            
                                Speeding up normal distribution probability mass allocation
                            
                                ValueError when trying to use pipenv install
                            
                                How to select only few columns in scikit learn column selector pipeline?
                            
                                How to call Rust async method from Python?
                            
                                Sum of an array while ignoring one minimum and one maximum
                            
                                How to speed up pandas apply for string matching
                            
                                NameError: name 'defaultParams' is not defined while running the .exe converted using Pyinstaller
                            
                                Flip a boolean value without referencing it twice
                            
                                Python3 virtualenv installation borked: No module named 'virtualenv.seed.via_app_data'
                            
                                How do I automerge dependabot updates (config version 2)?
                            
                                FastAPI - ENUM type models not populated
                            
                                Why is my function partially doing what it’s supposed to do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With