I have a time series in a dataframe with DatetimeIndex like that:
import pandas as pd
dates= ["2015-10-01 00:00:00",
"2015-10-01 01:00:00",
"2015-10-01 02:00:00",
"2015-10-01 03:00:00",
"2015-10-01 04:00:00"]
df = pd.DataFrame(index=pd.DatetimeIndex(dates))
df["values"] = range(0,5)
Out[]:
values
2015-10-01 00:00:00 0
2015-10-01 01:00:00 1
2015-10-01 02:00:00 2
2015-10-01 03:00:00 3
2015-10-01 04:00:00 4
I would like to as simple clean as possible select a row looking like that, based on the date being the key, e.g. "2015-10-01 02:00:00":
Out[]:
values
2015-10-01 02:00:00 2
Simply using indexing results in a key error:
df["2015-10-01 02:00:00"]
Out[]:
KeyError: '2015-10-01 02:00:00'
Similarly this:
df.loc[["2015-10-01 02:00:00"]]
Out[]:
KeyError: "None of [['2015-10-01 02:00:00']] are in the [index]"
These surprisingly (?) result in the same series as follows:
df.loc["2015-10-01 02:00:00"]
Out[]:
values 2
Name: 2015-10-01 02:00:00, dtype: int32
df.loc["2015-10-01 02:00:00",:]
Out[]:
values 2
Name: 2015-10-01 02:00:00, dtype: int32
print(type(df.loc["2015-10-01 02:00:00"]))
print(type(df.loc["2015-10-01 02:00:00",:]))
print(df.loc["2015-10-01 02:00:00"].shape)
print(df.loc["2015-10-01 02:00:00",:].shape)
Out[]:
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
(1,)
(1,)
I could wrap any of those in DataFrame like that:
slize = pd.DataFrame(df.loc["2015-10-01 02:00:00",:])
Out[]:
2015-10-01 02:00:00
values 2
Of course I could do this to reach my result:
slize.T
Out[]:
values
2015-10-01 02:00:00 2
But as at this point, I could also expect a column as a series it is kinda hard to test if it is a row or columns series to add the T automatically. Did I miss a way of selecting what I want?
I recommend to generate your index using pd.date_range for convenience, and then to use .loc with a Timestamp or datetime object.
from datetime import datetime
import pandas as pd
start = datetime(2015, 10, 1, 0, 0, 0)
end = datetime(2015, 10, 1, 4, 0, 0)
dates = pd.date_range(start, end, freq='H')
df = pd.DataFrame(index=pd.DatetimeIndex(dates))
df["values"] = range(0,5)
Then you can use .loc with a Timestamp or datetime object.
In [2]: df.loc[[start]]
Out[2]:
values
2015-10-01 0
Simply using indexing results in a key error:
df["2015-10-01 02:00:00"]
Out[]:
KeyError: '2015-10-01 02:00:00'
KeyError occurs because you try to return a view of the DataFrame by looking for a column named "2015-10-01 02:00:00"
Similarly this:
df.loc[["2015-10-01 02:00:00"]]
Out[]:
KeyError: "None of [['2015-10-01 02:00:00']] are in the [index]"
Your second option cannot work with str indexing, you should use exact indexing as mentioned instead.
These surprisingly (?) result in the same series as follows:
df.loc["2015-10-01 02:00:00"]
Out[]:
values 2
Name: 2015-10-01 02:00:00, dtype: int32
If you use .loc on a single row you will have a coercion to Series type as you noticed. Hence you shall cast to DataFrame and then transpose the result.
You can convert string to datetime - using exact indexing:
print (df.loc[[pd.to_datetime("2015-10-01 02:00:00")]])
values
2015-10-01 02:00:00 2
Or convert Series to DataFrame and transpose:
print (df.loc["2015-10-01 02:00:00"].to_frame().T)
values
2015-10-01 02:00:00 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With