Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Pandas: Slice Dataframe by Datetime (that may not exist) and Return View




I have a large DataFrame which I would like to slice so that I can perform some calculations on the sliced dataframe so that the values are updated in the original. In addition I am slicing the dataframe by a start and end time that may not exist in the index. Below is a simplified example, but I will actually want to update a number of columns based on different calculations.

In [1]: df

                         A        B         C
2014-01-02 14:00:00 -1.172285  1.706200    NaN
2014-01-02 14:05:00  0.039511 -0.320798    NaN
2014-01-02 14:10:00 -0.192179 -0.539397    NaN
2014-01-02 14:15:00 -0.475917 -0.280055    NaN
2014-01-02 14:20:00  0.163376  1.124602    NaN
2014-01-02 14:25:00 -2.477812  0.656750    NaN

I have tried all of the below statements to create sdf as view for my time range:

start = datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')

sdf = df[start:end]
sdf = df[start < df.index < end]
sdf = df.ix[start:end]
sdf = df.loc[start:end]
sdf = df.truncate(before=start, after=end, copy=False)

sdf[C] == 100

Most return a copy and I get a SettingWithCopyWarning warning. The loc function says the index is incompatible with datetime. Is this something I should be able to do. The result I would like after updating the slice is:

In [1]: df

                         A        B         C
2014-01-02 14:00:00 -1.172285  1.706200    NaN
2014-01-02 14:05:00  0.039511 -0.320798    NaN
2014-01-02 14:10:00 -0.192179 -0.539397    100
2014-01-02 14:15:00 -0.475917 -0.280055    100
2014-01-02 14:20:00  0.163376  1.124602    100
2014-01-02 14:25:00 -2.477812  0.656750    NaN

Can anyone please suggest a way to this? Am I approaching this the wrong way?


like image 833
PH82 Avatar asked Feb 10 '15 17:02


1 Answers

One way is to use loc and wrap your conditions in parentheses and use the bitwise oerator &, the bitwise operator is required as you are comparing an array of values and not a single value, the parentheses are required due to operator precedence. We can then use this to perform label selection using loc and set the 'C' column like so:

In [15]:

import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
df.loc[(df.index > start) & (df.index < end), 'C'] = 100
                            A         B    C
2014-01-02 14:00:00 -1.172285  1.706200  NaN
2014-01-02 14:05:00  0.039511 -0.320798  NaN
2014-01-02 14:10:00 -0.192179 -0.539397  100
2014-01-02 14:15:00 -0.475917 -0.280055  100
2014-01-02 14:20:00  0.163376  1.124602  100
2014-01-02 14:25:00 -2.477812  0.656750  NaN

If we look at each method you tried and why they didn't work:

sdf = df[start:end] #  will raise KeyError if start and end are not present in index
sdf = df[start < df.index < end] #  will raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), this is because you are comparing arrays of values not a single scalar value
sdf = df.ix[start:end] # raises KeyError same as first example
sdf = df.loc[start:end] #  raises KeyError same as first example
sdf = df.truncate(before=start, after=end, copy=False) # generates correct result but operations on this will raise SettingWithCopyWarning as you've found


You can set sdf to the mask and use this with loc to set your 'C' column:

In [7]:

import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
sdf = (df.index > start) & (df.index < end)
df.loc[sdf,'C'] = 100
                            A         B    C
2014-01-02 14:00:00 -1.172285  1.706200  NaN
2014-01-02 14:05:00  0.039511 -0.320798  NaN
2014-01-02 14:10:00 -0.192179 -0.539397  100
2014-01-02 14:15:00 -0.475917 -0.280055  100
2014-01-02 14:20:00  0.163376  1.124602  100
2014-01-02 14:25:00 -2.477812  0.656750  NaN
like image 200
EdChum Avatar answered Oct 02 '22 08:10
