I'm trying to find, at each timestamp, the column name in a dataframe for which the value matches with the one in a timeseries at the same timestamp. Here is my dataframe: <pre class="prettyprint"><code>>>> df col5 col4 col3 col2 col1 1979-01-01 00:00:00 1181.220328 912.154923 648.848635 390.986156 138.185861 1979-01-01 06:00:00 1190.724461 920.767974 657.099560 399.395338 147.761352 1979-01-01 12:00:00 1193.414510 918.121482 648.558837 384.632475 126.254342 1979-01-01 18:00:00 1171.670276 897.585930 629.201469 366.652033 109.545607 1979-01-02 00:00:00 1168.892579 900.375126 638.377583 382.584568 132.998706 >>> df.to_dict() {'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}} </code></pre> And the time series with values I want to match at each timestamp: <pre class="prettyprint"><code>>>> ts 1979-01-01 00:00:00 1181.220328 1979-01-01 06:00:00 657.099560 1979-01-01 12:00:00 126.254342 1979-01-01 18:00:00 109.545607 Freq: 6H >>> ts.to_dict() {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377} </code></pre> Then the result would be: <pre class="prettyprint"><code>>>> df_result value Column 1979-01-01 00:00:00 1181.220328 col5 1979-01-01 06:00:00 657.099560 col3 1979-01-01 12:00:00 126.254342 col1 1979-01-01 18:00:00 109.545607 col1 </code></pre> I hope my question is clear enough. Anyone has an idea how to get df_result? Thanks Greg

Just wanted to add that for a situation where multiple columns may have the value and you want all the column names in a list, you can do the following (e.g. get all column names with a value = 'x'): <pre class="prettyprint"><code>df.apply(lambda row: row[row == 'x'].index, axis=1) </code></pre> The idea is that you turn each row into a series (by adding <code>axis=1</code>) where the column names are now turned into the index of the series. You then filter your series with a condition (e.g. <code>row == 'x'</code>), then take the index values (aka column names!).

Get column name where value is something in pandas dataframe

Tags:

python

pandas

dataframe

I'm trying to find, at each timestamp, the column name in a dataframe for which the value matches with the one in a timeseries at the same timestamp.

Here is my dataframe:

>>> df                             col5        col4        col3        col2        col1 1979-01-01 00:00:00  1181.220328  912.154923  648.848635  390.986156  138.185861 1979-01-01 06:00:00  1190.724461  920.767974  657.099560  399.395338  147.761352 1979-01-01 12:00:00  1193.414510  918.121482  648.558837  384.632475  126.254342 1979-01-01 18:00:00  1171.670276  897.585930  629.201469  366.652033  109.545607 1979-01-02 00:00:00  1168.892579  900.375126  638.377583  382.584568  132.998706  >>> df.to_dict() {'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}}

And the time series with values I want to match at each timestamp:

>>> ts 1979-01-01 00:00:00    1181.220328 1979-01-01 06:00:00    657.099560 1979-01-01 12:00:00    126.254342 1979-01-01 18:00:00    109.545607 Freq: 6H  >>> ts.to_dict() {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}

Then the result would be:

>>> df_result                              value  Column 1979-01-01 00:00:00    1181.220328  col5 1979-01-01 06:00:00    657.099560   col3 1979-01-01 12:00:00    126.254342   col1 1979-01-01 18:00:00    109.545607   col1

I hope my question is clear enough. Anyone has an idea how to get df_result?

Thanks

Greg

372

asked Feb 06 '13 17:02

leroygr

1 Answers

Just wanted to add that for a situation where multiple columns may have the value and you want all the column names in a list, you can do the following (e.g. get all column names with a value = 'x'):

df.apply(lambda row: row[row == 'x'].index, axis=1)

The idea is that you turn each row into a series (by adding axis=1) where the column names are now turned into the index of the series. You then filter your series with a condition (e.g. row == 'x'), then take the index values (aka column names!).

103

answered Oct 11 '22 09:10

Nic Scozzaro

Related questions
                            
                                Using queues results in asyncio exception "got Future <Future pending> attached to a different loop"
                            
                                How to do "hit any key" in python?
                            
                                Is there any difference between cpython and python [duplicate]
                            
                                Django - enforcing ManyToManyField unique items
                            
                                Identifying Excel Sheet cell color code using XLRD package
                            
                                What is the pythonic way to loop through two arrays at the same time?
                            
                                Docker NLTK Download
                            
                                Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
                            
                                How do I merge lists in python? [duplicate]
                            
                                Programming in Python vs. programming in Java
                            
                                Compare XML snippets?
                            
                                Longest increasing subsequence
                            
                                SQLAlchemy ordering by count on a many to many relationship
                            
                                Vim and PEP 8 -- Style Guide for Python Code
                            
                                Getting values with the right type in Redis
                            
                                scipy minimize with constraints
                            
                                I know of f-strings, but what are r-strings? Are there others?
                            
                                Swap two rows in a numpy array in python [duplicate]
                            
                                How to get hard disk serial number using Python
                            
                                Override module method where from...import is used

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With