I need to serialize a Pandas DataFrame to JSON using the to_json
method. Here is an example of how I am doing that:
import pandas
import numpy as np
dr = pandas.date_range('2016-01-01T12:30:00Z', '2016-02-01T12:30:00Z')
data = np.random.rand(len(dr), 2)
df = pandas.DataFrame(data, index=dr, columns=['a', 'b'])
# NOTE: The index for df has the following properties in pandas 0.19.2
# dtype='datetime64[ns, UTC]', freq='D'
# Save to JSON
df.to_json('/tmp/test_data_01.json', date_unit='s', date_format='iso')
Using the code above I see that my DataFrame has been saved to disk and that the indices look like: [2016-01-01T12:30:00Z, 2016-01-02T12:30:00Z, ...] in the file /tmp/test_data_01.json.
The problem is that when I do the following:
df2 = pandas.read_json('/tmp/test_data_01.json')
the index for df2 has no timezone.
df2.index.tz
# Returns None
Is there anyway to keep the timezone property of a DataFrame that is serialized to JSON and deserialized back?
Pandas DataFrame: unstack() functionPivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
The loc property is used to access a group of rows and columns by label(s) or a boolean array.
Pandas Time Series Data StructuresFor time stamps, Pandas provides the Timestamp type. As mentioned before, it is essentially a replacement for Python's native datetime , but is based on the more efficient numpy. datetime64 data type. The associated Index structure is DatetimeIndex .
Pandas will convert everything to UTC when using to_json
.
See this example where I change it to Europe/Paris
which is UTC+1:
In [1]:
dr = pd.date_range('2016-01-01T12:30:00Z', '2016-02-01T12:30:00Z')
dr = dr.tz_convert('Europe/Paris')
data = np.random.rand(len(dr), 2)
df = pd.DataFrame(data, index=dr, columns=['a', 'b'])
In [2]: df.index[0]
Out[2]: Timestamp('2016-01-01 13:30:00+0100', tz='Europe/Paris', freq='D')
In [3]: df.to_json('test_data_01.json', date_unit='s', date_format='iso')
If I open the test_data_01.json
, the first one is "2016-01-01T12:30:00Z"
.
So when you load the json, localize it to UTC. There's no way to know what tz was used beforehand though:
In [4]:
df2 = pd.read_json('test_data_01.json')
df2.index = df2.index.tz_localize('UTC')
As of PR #35973, (version 1.2.0 I think) timezones are now supported when using the orient='table'
argument.
import pandas as pd
import numpy as np
dr = pd.date_range("2020-08-30", freq="d", periods=4, tz="Europe/London")
data = np.random.rand(len(dr), 2)
df = pd.DataFrame(data, index=dr, columns=['a', 'b'])
print(df)
print(pd.read_json(df.to_json(orient='table'), orient='table')) # same output!
I'm not agree with the solution of @julien-marrec, because it force the timezone to be UTC, and when calling read_json the timezone could be anything else. I had implemented the following workaround that parse date while analyzing timezone.
import pandas._libs.json as json
loads = json.loads
result = loads('{"2019-01-01T13:00:00.000Z":15,"2019-01-01T11:00:00.000Z":88.352985054,"2019-01-01T12:00:00.000Z":90.091719896}',
dtype=None, numpy=True, labelled=True )
pd.Series(result[0], pd.DatetimeIndex(result[1])).index
And filled a bug about that https://github.com/pandas-dev/pandas/issues/25546
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With