Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I keep the timezone of my index when serializing/deserializing a Pandas DataFrame using JSON

Tags:

python

pandas

I need to serialize a Pandas DataFrame to JSON using the to_json method. Here is an example of how I am doing that:

import pandas
import numpy as np
dr = pandas.date_range('2016-01-01T12:30:00Z', '2016-02-01T12:30:00Z')
data = np.random.rand(len(dr), 2)
df = pandas.DataFrame(data, index=dr, columns=['a', 'b'])

# NOTE: The index for df has the following properties in pandas 0.19.2
#       dtype='datetime64[ns, UTC]', freq='D'

# Save to JSON
df.to_json('/tmp/test_data_01.json', date_unit='s', date_format='iso')

Using the code above I see that my DataFrame has been saved to disk and that the indices look like: [2016-01-01T12:30:00Z, 2016-01-02T12:30:00Z, ...] in the file /tmp/test_data_01.json.

The problem is that when I do the following:

df2 = pandas.read_json('/tmp/test_data_01.json')

the index for df2 has no timezone.

df2.index.tz
# Returns None

Is there anyway to keep the timezone property of a DataFrame that is serialized to JSON and deserialized back?

like image 735
aquil.abdullah Avatar asked Jan 05 '17 19:01

aquil.abdullah


People also ask

What is Unstacking in Pandas?

Pandas DataFrame: unstack() functionPivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

What does LOC in Pandas do?

The loc property is used to access a group of rows and columns by label(s) or a boolean array.

Is time series supported in Pandas?

Pandas Time Series Data StructuresFor time stamps, Pandas provides the Timestamp type. As mentioned before, it is essentially a replacement for Python's native datetime , but is based on the more efficient numpy. datetime64 data type. The associated Index structure is DatetimeIndex .


3 Answers

Pandas will convert everything to UTC when using to_json.

See this example where I change it to Europe/Paris which is UTC+1:

In [1]:
dr = pd.date_range('2016-01-01T12:30:00Z', '2016-02-01T12:30:00Z')
dr = dr.tz_convert('Europe/Paris')
data = np.random.rand(len(dr), 2)
df = pd.DataFrame(data, index=dr, columns=['a', 'b'])

In [2]: df.index[0]
Out[2]: Timestamp('2016-01-01 13:30:00+0100', tz='Europe/Paris', freq='D')

In [3]: df.to_json('test_data_01.json', date_unit='s', date_format='iso')

If I open the test_data_01.json, the first one is "2016-01-01T12:30:00Z".

So when you load the json, localize it to UTC. There's no way to know what tz was used beforehand though:

In [4]:
df2 = pd.read_json('test_data_01.json')
df2.index = df2.index.tz_localize('UTC')
like image 149
Julien Marrec Avatar answered Oct 21 '22 09:10

Julien Marrec


As of PR #35973, (version 1.2.0 I think) timezones are now supported when using the orient='table' argument.

import pandas as pd
import numpy as np

dr = pd.date_range("2020-08-30", freq="d", periods=4, tz="Europe/London")
data = np.random.rand(len(dr), 2)
df = pd.DataFrame(data, index=dr, columns=['a', 'b'])

print(df)
print(pd.read_json(df.to_json(orient='table'), orient='table'))  # same output!
like image 24
Attack68 Avatar answered Oct 21 '22 09:10

Attack68


I'm not agree with the solution of @julien-marrec, because it force the timezone to be UTC, and when calling read_json the timezone could be anything else. I had implemented the following workaround that parse date while analyzing timezone.

import pandas._libs.json as json
loads = json.loads
result = loads('{"2019-01-01T13:00:00.000Z":15,"2019-01-01T11:00:00.000Z":88.352985054,"2019-01-01T12:00:00.000Z":90.091719896}',
          dtype=None, numpy=True, labelled=True )
pd.Series(result[0], pd.DatetimeIndex(result[1])).index

And filled a bug about that https://github.com/pandas-dev/pandas/issues/25546

like image 1
Jérôme B Avatar answered Oct 21 '22 07:10

Jérôme B