I get a TypeError:
TypeError: '<' not supported between instances of 'datetime.date' and 'str'`
While running the following piece of code:
import requests
import re
import json
import pandas as pd
def retrieve_quotes_historical(stock_code):
quotes = []
url = 'https://finance.yahoo.com/quote/%s/history?p=%s' % (stock_code, stock_code)
r = requests.get(url)
m = re.findall('"HistoricalPriceStore":{"prices":(.*?), "isPending"', r.text)
if m:
quotes = json.loads(m[0])
quotes = quotes[::-1]
return [item for item in quotes if not 'type' in item]
quotes = retrieve_quotes_historical('INTC')
df = pd.DataFrame(quotes)
s = pd.Series(pd.to_datetime(df.date, unit='s'))
df.date = s.dt.date
df = df.set_index('date')
This piece runs all smooth, but when I try to run this piece of code:
df['2017-07-07':'2017-07-10']
I get the TypeError.
How can I fix this?
The thing is you want to slice using Strings '2017-07-07'
while your index is of type datetime.date
. Your slices should be of this type too.
You can do this by defining your startdate and endate as follows:
import pandas as pd
startdate = pd.to_datetime("2017-7-7").date()
enddate = pd.to_datetime("2017-7-10").date()
df.loc[startdate:enddate]
startdate & enddate are now of type datetime.date
and your slice will work:
adjclose close high low open volume
date
2017-07-07 33.205006 33.880001 34.119999 33.700001 33.700001 18304500
2017-07-10 32.979588 33.650002 33.740002 33.230000 33.250000 29918400
It is also possible to create datetime.date type without pandas:
import datetime
startdate = datetime.datetime.strptime('2017-07-07', "%Y-%m-%d").date()
enddate = datetime.datetime.strptime('2017-07-10', "%Y-%m-%d").date()
In addition to Paul's answer, a few things to note:
pd.to_datetime(df['date'],unit='s')
already returns a Series
so you do not need to wrap it.
besides, when parsing is successful the Series
returned by pd.to_datetime
has dtype
datetime64[ns] (timezone-naïve) or datetime64[ns, tz] (timezone-aware). If parsing fails, it may still return a Series without error, of dtype O
for "object" (at least in pandas 1.2.4), denoting falling back to Python's stdlib datetime.datetime
.
filtering using strings as in df['2017-07-07':'2017-07-10']
only works when the dtype
of the index is datetime64[...]
, not when it is O
(object
So with all of this, your example can be made to work by only changing the last lines:
df = pd.DataFrame(quotes)
s = pd.to_datetime(df['date'],unit='s') # no need to wrap in Series
assert str(s.dtype) == 'datetime64[ns]' # VERY IMPORTANT!!!!
df.index = s
print(df['2020-08-01':'2020-08-10']) # it now works!
It yields:
date open ... volume adjclose
date ...
2020-08-03 13:30:00 1596461400 48.270000 ... 31767100 47.050617
2020-08-04 13:30:00 1596547800 48.599998 ... 29045800 47.859154
2020-08-05 13:30:00 1596634200 49.720001 ... 29438600 47.654583
2020-08-06 13:30:00 1596720600 48.790001 ... 23795500 47.634968
2020-08-07 13:30:00 1596807000 48.529999 ... 36765200 47.105358
2020-08-10 13:30:00 1597066200 48.200001 ... 37442600 48.272457
Also finally note that if your datetime format somehow contains the time offset, there seem to be a mandatory utc=True
argument to add (in Pandas 1.2.4) to pd.to_datetime
, otherwise the returned dtype will be 'O' even if parsing is successful. I hope that this will improve in the future, as it is not intuitive at all.
See to_datetime
documentation for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With