Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Slice Dataframe by Datetime (that may not exist) and Return View

Tags:

python

pandas

I have a large DataFrame which I would like to slice so that I can perform some calculations on the sliced dataframe so that the values are updated in the original. In addition I am slicing the dataframe by a start and end time that may not exist in the index. Below is a simplified example, but I will actually want to update a number of columns based on different calculations.

In [1]: df
Out[1]:

                         A        B         C
TIME
2014-01-02 14:00:00 -1.172285  1.706200    NaN
2014-01-02 14:05:00  0.039511 -0.320798    NaN
2014-01-02 14:10:00 -0.192179 -0.539397    NaN
2014-01-02 14:15:00 -0.475917 -0.280055    NaN
2014-01-02 14:20:00  0.163376  1.124602    NaN
2014-01-02 14:25:00 -2.477812  0.656750    NaN

I have tried all of the below statements to create sdf as view for my time range:

start = datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')

sdf = df[start:end]
sdf = df[start < df.index < end]
sdf = df.ix[start:end]
sdf = df.loc[start:end]
sdf = df.truncate(before=start, after=end, copy=False)

sdf[C] == 100

Most return a copy and I get a SettingWithCopyWarning warning. The loc function says the index is incompatible with datetime. Is this something I should be able to do. The result I would like after updating the slice is:

In [1]: df
Out[1]:

                         A        B         C
TIME
2014-01-02 14:00:00 -1.172285  1.706200    NaN
2014-01-02 14:05:00  0.039511 -0.320798    NaN
2014-01-02 14:10:00 -0.192179 -0.539397    100
2014-01-02 14:15:00 -0.475917 -0.280055    100
2014-01-02 14:20:00  0.163376  1.124602    100
2014-01-02 14:25:00 -2.477812  0.656750    NaN

Can anyone please suggest a way to this? Am I approaching this the wrong way?

Thanks

like image 833
PH82 Avatar asked Feb 10 '15 17:02

PH82


1 Answers

One way is to use loc and wrap your conditions in parentheses and use the bitwise oerator &, the bitwise operator is required as you are comparing an array of values and not a single value, the parentheses are required due to operator precedence. We can then use this to perform label selection using loc and set the 'C' column like so:

In [15]:

import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
df.loc[(df.index > start) & (df.index < end), 'C'] = 100
df
Out[15]:
                            A         B    C
TIME                                        
2014-01-02 14:00:00 -1.172285  1.706200  NaN
2014-01-02 14:05:00  0.039511 -0.320798  NaN
2014-01-02 14:10:00 -0.192179 -0.539397  100
2014-01-02 14:15:00 -0.475917 -0.280055  100
2014-01-02 14:20:00  0.163376  1.124602  100
2014-01-02 14:25:00 -2.477812  0.656750  NaN

If we look at each method you tried and why they didn't work:

sdf = df[start:end] #  will raise KeyError if start and end are not present in index
sdf = df[start < df.index < end] #  will raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), this is because you are comparing arrays of values not a single scalar value
sdf = df.ix[start:end] # raises KeyError same as first example
sdf = df.loc[start:end] #  raises KeyError same as first example
sdf = df.truncate(before=start, after=end, copy=False) # generates correct result but operations on this will raise SettingWithCopyWarning as you've found

EDIT

You can set sdf to the mask and use this with loc to set your 'C' column:

In [7]:

import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
sdf = (df.index > start) & (df.index < end)
df.loc[sdf,'C'] = 100
df
Out[7]:
                            A         B    C
TIME                                        
2014-01-02 14:00:00 -1.172285  1.706200  NaN
2014-01-02 14:05:00  0.039511 -0.320798  NaN
2014-01-02 14:10:00 -0.192179 -0.539397  100
2014-01-02 14:15:00 -0.475917 -0.280055  100
2014-01-02 14:20:00  0.163376  1.124602  100
2014-01-02 14:25:00 -2.477812  0.656750  NaN
like image 200
EdChum Avatar answered Oct 02 '22 08:10

EdChum