This is a two-part question, with an immediate question and a more general one.
I have a pandas TimeSeries, ts. To know the first value after a certain time. I can do this,
ts.ix[ts[datetime(2012,1,1,15,0,0):].first_valid_index()]
a) Is there a better, less clunky way to do it?
b) Coming from C, I have a certain phobia when dealing with these somewhat opaque, possibly mutable but generally not, possibly lazy but not always types. So to be clear, when I do
ts[datetime(2012,1,1,15,0,0):].first_valid_index()
ts[datetime(2012,1,1,15,0,0):] is a pandas.TimeSeries object right? And I could possibly mutate it.
Does it mean that whenever I take a slice, there's a copy of ts being allocated in memory? Does it mean that this innocuous line of code could actually trigger the copy of a gigabyte of TimeSeries just to get an index value?
Or perhaps they magically share memory and a lazy copy is done if one of the object is mutated for instance? But then, how do you know which specific operations trigger a copy? Maybe not slicing but how about renaming columns? It doesn't seem to say so in the documentation. Does that bother you? Should it bother me or should I just learn not to worry and catch problems with a profiler?
Some setup:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: from datetime import datetime
In [4]: dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
In [5]: ts = pd.Series(np.random.randn(6), index=dates)
In [6]: ts
Out[6]:
2011-01-02 -0.412335
2011-01-05 -0.809092
2011-01-07 -0.442320
2011-01-08 -0.337281
2011-01-10 0.522765
2011-01-12 1.559876
Okay, now to answer your first question, a) yes, there are less clunky ways, depending on your intention. This is pretty simple:
In [9]: ts[datetime(2011, 1, 8):]
Out[9]:
2011-01-08 -0.337281
2011-01-10 0.522765
2011-01-12 1.559876
This is a slice containing all the values after your chosen date. You can select just the first one, as you wanted, by:
In [10]: ts[datetime(2011, 1, 8):][0]
Out[10]: -0.33728079849770815
To your second question, (b) -- this type of indexing is a slice of the original, just as other numpy arrays. It is NOT a copy of the original. See this question, or many similar: Bug or feature: cloning a numpy array w/ slicing
To demonstrate, let's modify the slice:
In [21]: ts2 = ts[datetime(2011, 1, 8):]
In [23]: ts2[0] = 99
This changes the original timeseries object ts, since ts2 is a slice and not a copy.
In [24]: ts
Out[24]:
2011-01-02 -0.412335
2011-01-05 -0.809092
2011-01-07 -0.442320
2011-01-08 99.000000
2011-01-10 0.522765
2011-01-12 1.559876
If you DO want a copy, you can (in general) use the copy method or, (in this case) use truncate:
In [25]: ts3 = ts.truncate(before='2011-01-08')
In [26]: ts3
Out[26]:
2011-01-08 99.000000
2011-01-10 0.522765
2011-01-12 1.559876
Changing this copy will not change the original.
In [27]: ts3[1] = 99
In [28]: ts3
Out[28]:
2011-01-08 99.000000
2011-01-10 99.000000
2011-01-12 1.559876
In [29]: ts #The january 10th value will be unchanged.
Out[29]:
2011-01-02 -0.412335
2011-01-05 -0.809092
2011-01-07 -0.442320
2011-01-08 99.000000
2011-01-10 0.522765
2011-01-12 1.559876
This example is straight out of "Python for Data Analysis" by Wes. Check it out. It's great.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With