Python reshaping based on column names keeping ID & other rows

Tags:

pandas

What is the best way to reshape a df so it stacks columns based on a similarity in column name while keeping the unique ID portion of the column name in a new column?

I have a df similar to the following (my actual data also includes NaN values that need to remain):

df = pandas.DataFrame({"RX_9mm": scipy.randn(5), "RY_9mm": scipy.randn(5),"TX_9mm": scipy.randn(5), "TY_9mm": scipy.randn(5), "RX_10mm": scipy.randn(5), "RY_10mm": scipy.randn(5),"TX_10mm": scipy.randn(5), "TY_10mm": scipy.randn(5), "time": range(5)})

    RX_9mm  RY_9mm  TX_9mm  TY_9mm  RX_10mm  RY_10mm  TX_10mm  TY_10mm  time
0 -0.1444  2.1319  1.9665  0.1773   0.5156  -1.8461   0.9122   1.1285     0
1  1.4831 -0.8773 -1.0112 -0.0010   1.4532  -1.3721   0.6894  -0.1781     1
2  0.3685  0.2148 -1.2216  0.0098  -1.1427  -0.1851   0.3890   0.9552     2
3  0.6843 -2.0279 -1.1342 -0.8869   0.2718  -2.4857  -1.0496  -0.4286     3
4 -1.5625 -0.2733 -0.1243 -1.2248  -0.7403  -0.5840   0.1797  -0.7014     4

However I need it to look like this:

       RX      RY      TX      TY time   ID
0 -0.1444  2.1319  1.9665  0.1773    0  9mm
1  1.4831 -0.8773 -1.0112 -0.0010    1  9mm
2  0.3685  0.2148 -1.2216  0.0098    2  9mm
3  0.6843 -2.0279 -1.1342 -0.8869    3  9mm
4 -1.5625 -0.2733 -0.1243 -1.2248    4  9mm
5  0.5156 -1.8461  0.9122  1.1285    0 10mm  
6  1.4532 -1.3721  0.6894 -0.1781    1 10mm
7 -1.1427 -0.1851  0.3890  0.9552    2 10mm
8  0.2718 -2.4857 -1.0496 -0.4286    3 10mm
9 -0.7403 -0.5840  0.1797 -0.7014    4 10mm

I tried to use the following code from an example from 'Reshaping dataframes in pandas based on column labels' by Chang She

However when I use the following code:

id = df.ix[:, ['time']]
df.columns = pandas.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
pandas.merge(df.stack(0).reset_index(1), id, left_index=True, right_index=True)

I get:

       RX      RY      TX      TY      RX      RY      TX      TY time
      9mm     9mm     9mm     9mm    10mm    10mm    10mm    10mm  NaN
0 -0.1444  2.1319  1.9665  0.1773  0.5156 -1.8461  0.9122  1.1285    0
1  1.4831 -0.8773 -1.0112 -0.0010  1.4532 -1.3721  0.6894 -0.1781    1
2  0.3685  0.2148 -1.2216  0.0098 -1.1427 -0.1851  0.3890  0.9552    2
3  0.6843 -2.0279 -1.1342 -0.8869  0.2718 -2.4857 -1.0496 -0.4286    3
4 -1.5625 -0.2733 -0.1243 -1.2248 -0.7403 -0.5840  0.1797 -0.7014    4

I understand that the new columns are multi level with the measurement (RX, RY etc.) and the ID (9mm, 10mm) levels, but I don't understand how to stack the measurements into the same columns while keeping the ID as a new column.

If anyone could explain what I am doing wrong to get this output and not the stacked columns, I would really appreciate it.

Thank you

732

asked Sep 26 '18 10:09

VacciniumC

1 Answers

You can simplify your solution, last merge is not necessary, because is converted column time to index by set_index in first step:

df = df.set_index('time')
#expand=True in columns create MultiIndex
df.columns = df.columns.str.split('_', expand=True)
#rename_axis set MultiIndex names for names of columns after reset index
df = df.stack(dropna=False).rename_axis(['time','ID']).reset_index()
print (df)
   time    ID        RX        RY        TX        TY
0     0  10mm -0.549487 -0.349412 -0.620500  0.992223
1     0   9mm  0.831292 -2.465550 -0.863001 -1.335898
2     1  10mm  0.214057 -0.136649  1.831669 -0.672306
3     1   9mm -0.372416 -1.633798 -0.414518 -1.426492
4     2  10mm  0.480018  1.575599 -1.330841 -0.780036
5     2   9mm  0.352044 -1.008269  1.339841  0.423539
6     3  10mm -0.822354  0.002455 -1.099829  0.060929
7     3   9mm  1.336161 -0.066224 -1.111453  1.651180
8     4  10mm  0.627119 -0.419848  1.052179 -0.426928
9     4   9mm  0.701500 -0.833526  2.563398  0.749432

If want change ordering of columns is possible use numpy.r_:

df = df[np.r_[df.columns[2:], df.columns[:2]]]
print (df)
         RX        RY        TX        TY  time    ID
0 -0.549487 -0.349412 -0.620500  0.992223     0  10mm
1  0.831292 -2.465550 -0.863001 -1.335898     0   9mm
2  0.214057 -0.136649  1.831669 -0.672306     1  10mm
3 -0.372416 -1.633798 -0.414518 -1.426492     1   9mm
4  0.480018  1.575599 -1.330841 -0.780036     2  10mm
5  0.352044 -1.008269  1.339841  0.423539     2   9mm
6 -0.822354  0.002455 -1.099829  0.060929     3  10mm
7  1.336161 -0.066224 -1.111453  1.651180     3   9mm
8  0.627119 -0.419848  1.052179 -0.426928     4  10mm
9  0.701500 -0.833526  2.563398  0.749432     4   9mm

192

answered Oct 04 '22 17:10

jezrael

Related questions
                            
                                Python write to hdfs file
                            
                                Recurrentshop and Keras: multi-dimensional RNN results in a dimensions mismatch error
                            
                                Using ROIPooling layer with a pretrained ResNet34 model in MxNet-Gluon
                            
                                How to bundle Python for AWS Lambda
                            
                                Running nested functions using numba
                            
                                Assign values to SparseArray in Pandas?
                            
                                Create a text file, and email it without saving it locally
                            
                                Importing the multiarray numpy extension module failed (Just with Anaconda)
                            
                                Why does sqlite3 still use __conform__?
                            
                                How are symbols contained in the libpythonX.X linked to numpy extension dynamic libraries?
                            
                                Draw a border around a matplotlib line
                            
                                Python - Accurate time.sleep
                            
                                Best way to add pandas DataFrame column to row [duplicate]
                            
                                Setuptools: How to make sure file generated by packed code be deleted by pip
                            
                                Interpreting logistic regression feature coefficient values in sklearn
                            
                                Correct way to get output of intermediate layer in Keras model?
                            
                                ForeignKey fields in add/change forms - Django admin
                            
                                Extended dict-like subclass to support casting and JSON dumping without extras
                            
                                How to replace/delete text from a pdf using python?
                            
                                Cross-validation and parameters tuning with XGBoost and hyperopt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With