I have a Pandas dataframe which is indexed by a DatetimeIndex: <pre class="prettyprint"><code><class 'pandas.core.frame.DataFrame'> DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23 Data columns: Date(dd-mm-yy)_Time(hh-mm-ss) 53732 non-null values Julian_Day 53732 non-null values AOT_870 53732 non-null values 440-870Angstrom 53732 non-null values 440-675Angstrom 53732 non-null values 500-870Angstrom 53732 non-null values Last_Processing_Date(dd/mm/yyyy) 53732 non-null values Solar_Zenith_Angle 53732 non-null values time 53732 non-null values dtypes: datetime64[ns](2), float64(6), object(1) </code></pre> I want to find the row that is closest to a certain time: <pre class="prettyprint"><code>image_time = dateutil.parser.parse('2009-07-28 13:39:02') </code></pre> and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work. For example: <pre class="prettyprint"><code>aeronet.index - image_time </code></pre> Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that: <pre class="prettyprint"><code>aeronet['time'] = aeronet.index aeronet.time - image_time </code></pre> This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running <code>abs</code> or <code>np.abs</code> on it gives an error: <pre class="prettyprint"><code>abs(aeronet.time - image_time) C:\Python27\lib\site-packages\pandas\core\series.pyc in __repr__(self) 1061 Yields Bytestring in Py2, Unicode String in py3. 1062 """ -> 1063 return str(self) 1064 1065 def _tidy_repr(self, max_vals=20): C:\Python27\lib\site-packages\pandas\core\series.pyc in __str__(self) 1021 if py3compat.PY3: 1022 return self.__unicode__() -> 1023 return self.__bytes__() 1024 1025 def __bytes__(self): C:\Python27\lib\site-packages\pandas\core\series.pyc in __bytes__(self) 1031 """ 1032 encoding = com.get_option("display.encoding") -> 1033 return self.__unicode__().encode(encoding, 'replace') 1034 1035 def __unicode__(self): C:\Python27\lib\site-packages\pandas\core\series.pyc in __unicode__(self) 1044 else get_option("display.max_rows")) 1045 if len(self.index) > (max_rows or 1000): -> 1046 result = self._tidy_repr(min(30, max_rows - 4)) 1047 elif len(self.index) > 0: 1048 result = self._get_repr(print_header=True, C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self, max_vals) 1069 """ 1070 num = max_vals // 2 -> 1071 head = self[:num]._get_repr(print_header=True, length=False, 1072 name=False) 1073 tail = self[-(max_vals - num):]._get_repr(print_header=False, AttributeError: 'numpy.ndarray' object has no attribute '_get_repr' </code></pre> Am I approaching this the right way? If so, how should I get <code>abs</code> to work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?

This simple method will return the (integer index of the) TimeSeriesIndex entry closest to a given datetime object. There's no need to copy the index to a regular column - simply use the <code>.to_pydatetime</code> method instead. <pre class="prettyprint lang-python prettyprint-override"><code>import numpy as np i = np.argmin(np.abs(df.index.to_pydatetime() - image_time)) </code></pre> Then you simply use the DataFrame's <code>.iloc</code> indexer: <pre class="prettyprint lang-python prettyprint-override"><code>df.iloc[i] </code></pre> Here's a function to do this: <pre class="prettyprint lang-python prettyprint-override"><code>def fcl(df, dtObj): return df.iloc[np.argmin(np.abs(df.index.to_pydatetime() - dtObj))] </code></pre> You can then further filter seamlessly, e.g. <pre class="prettyprint lang-python prettyprint-override"><code>fcl(df, dtObj)['column'] </code></pre>

Find closest row of DataFrame to given time in Pandas

Tags:

python

datetime

pandas

time-series

I have a Pandas dataframe which is indexed by a DatetimeIndex:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss)       53732  non-null values
Julian_Day                          53732  non-null values
AOT_870                             53732  non-null values
440-870Angstrom                     53732  non-null values
440-675Angstrom                     53732  non-null values
500-870Angstrom                     53732  non-null values
Last_Processing_Date(dd/mm/yyyy)    53732  non-null values
Solar_Zenith_Angle                  53732  non-null values
time                                53732  non-null values
dtypes: datetime64[ns](2), float64(6), object(1)

I want to find the row that is closest to a certain time:

image_time = dateutil.parser.parse('2009-07-28 13:39:02')

and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work.

For example:

aeronet.index - image_time

Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that:

aeronet['time'] = aeronet.index
aeronet.time - image_time

This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running abs or np.abs on it gives an error:

abs(aeronet.time - image_time)

C:\Python27\lib\site-packages\pandas\core\series.pyc in __repr__(self)
   1061         Yields Bytestring in Py2, Unicode String in py3.
   1062         """
-> 1063         return str(self)
   1064 
   1065     def _tidy_repr(self, max_vals=20):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __str__(self)
   1021         if py3compat.PY3:
   1022             return self.__unicode__()
-> 1023         return self.__bytes__()
   1024 
   1025     def __bytes__(self):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __bytes__(self)
   1031         """
   1032         encoding = com.get_option("display.encoding")
-> 1033         return self.__unicode__().encode(encoding, 'replace')
   1034 
   1035     def __unicode__(self):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
   1044                     else get_option("display.max_rows"))
   1045         if len(self.index) > (max_rows or 1000):
-> 1046             result = self._tidy_repr(min(30, max_rows - 4))
   1047         elif len(self.index) > 0:
   1048             result = self._get_repr(print_header=True,

C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self, max_vals)
   1069         """
   1070         num = max_vals // 2
-> 1071         head = self[:num]._get_repr(print_header=True, length=False,
   1072                                     name=False)
   1073         tail = self[-(max_vals - num):]._get_repr(print_header=False,

AttributeError: 'numpy.ndarray' object has no attribute '_get_repr'

Am I approaching this the right way? If so, how should I get abs to work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?

741

asked Feb 27 '13 15:02

robintw

1 Answers

This simple method will return the (integer index of the) TimeSeriesIndex entry closest to a given datetime object. There's no need to copy the index to a regular column - simply use the .to_pydatetime method instead.

import numpy as np

i = np.argmin(np.abs(df.index.to_pydatetime() - image_time))

Then you simply use the DataFrame's .iloc indexer:

df.iloc[i]

Here's a function to do this:

def fcl(df, dtObj):
    return df.iloc[np.argmin(np.abs(df.index.to_pydatetime() - dtObj))]

You can then further filter seamlessly, e.g.

fcl(df, dtObj)['column']

answered Oct 02 '22 15:10

cmeeren

Related questions
                            
                                storing unbound python functions in a class object
                            
                                Stop execution of a script called with execfile
                            
                                ftplib checking if a file is a folder?
                            
                                MySQL LOAD DATA LOCAL INFILE example in python?
                            
                                Python Auto Importing [duplicate]
                            
                                How check if a task is already in python Queue?
                            
                                Race-condition creating folder in Python
                            
                                Python multiprocessing process vs. standalone Python VM
                            
                                Is there a multithreaded map() function? [closed]
                            
                                Subsetting data in Python
                            
                                python 3: how to check if an object is a function? [duplicate]
                            
                                Can a python program be run on a computer without Python? What about C/C++?
                            
                                How to use pipe in IPython
                            
                                Jinja2 ignore UndefinedErrors for objects that aren't found
                            
                                How to monkey patch Django?
                            
                                django querysets + memcached: best practices
                            
                                slices to immutable strings by reference and not copy
                            
                                UUID field added after data already in database. Is there any way to populate the UUID field for existing data?
                            
                                Python Opencv SolvePnP yields wrong translation vector
                            
                                Why are uncompiled, repeatedly used regexes so much slower in Python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With