Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic way to parse POSIX timestamps in pandas?

I have a csv file with a time column representing POSIX timestamps in milliseconds. When I read it in pandas, it correctly reads it as Int64 but I would like to convert it to a DatetimeIndex. Right now I first convert it to datetime object and then cast it to a DatetimeIndex.

In [20]: df.time.head()

Out[20]: 
0    1283346000062
1    1283346000062
2    1283346000062
3    1283346000062
4    1283346000300
Name: time

In [21]: map(datetime.fromtimestamp, df.time.head()/1000.)
Out[21]: 
[datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 300000)]

In [22]: pandas.DatetimeIndex(map(datetime.fromtimestamp, df.time.head()/1000.))
Out[22]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-09-01 09:00:00.062000, ..., 2010-09-01 09:00:00.300000]
Length: 5, Freq: None, Timezone: None

Is there an idiomatic way of doing this? And more importantly is this the recommended way of storing non-unique timestmaps in pandas?

like image 494
signalseeker Avatar asked Sep 03 '12 16:09

signalseeker


People also ask

How do I parse a date in pandas?

For non-standard datetime parsing, use pd.to_datetime after pd.read_csv . To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True . See Parsing a CSV with mixed timezones for more.

How do I change timestamp on pandas?

replace() function is used to replace the member values of the given Timestamp. The function implements datetime. replace, and it also handles nanoseconds.

What is Panda timestamp?

Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Parameters ts_inputdatetime-like, str, int, float. Value to be converted to Timestamp.


2 Answers

You can use a converter in combination with read_csv.

In [423]: d = """\
timestamp data
1283346000062 a
1283346000062 b
1283346000062 c
1283346000062 d
1283346000300 e
"""

In [424]: fromtimestamp = lambda x:datetime.fromtimestamp(int(x) / 1000.)

In [425]: df = pandas.read_csv(StringIO(d), sep='\s+', converters={'timestamp': fromtimestamp}).set_index('timestamp')

In [426]: df.index
Out[426]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-09-01 15:00:00.062000, ..., 2010-09-01 15:00:00.300000]
Length: 5, Freq: None, Timezone: None

In [427]: df
Out[427]:
                           data
timestamp
2010-09-01 15:00:00.062000    a
2010-09-01 15:00:00.062000    b
2010-09-01 15:00:00.062000    c
2010-09-01 15:00:00.062000    d
2010-09-01 15:00:00.300000    e
like image 113
Wouter Overmeire Avatar answered Sep 20 '22 12:09

Wouter Overmeire


Internally, Timestamps are stored in int representing nanoseconds. They use the numpy datetime/timedelta. The issue with your timestamps is that they are in ms precision, which you already know since you're dividing by 1000. In this case, it's easier to astype('M8[ms]'). It's essentially saying view these ints as datetime-ints with ms precision.

In [21]: int_arr
Out[21]: 
array([1283346000062, 1283346000062, 1283346000062, 1283346000062,
       1283346000300])

In [22]: int_arr.astype('M8[ms]')
Out[22]: 
array(['2010-09-01T09:00:00.062-0400', '2010-09-01T09:00:00.062-0400',
       '2010-09-01T09:00:00.062-0400', '2010-09-01T09:00:00.062-0400',
       '2010-09-01T09:00:00.300-0400'], dtype='datetime64[ms]')

Pandas will assume any regular int array is in M8[ns]. An array with a datetime64 dtype will be correctly interpreted. You can view the M8[ns] representation of a DatetimeIndex by access ing it's asi8 attribute.

[EDIT] I realize that this won't help you directly with read_csv. Just thought I'd throw out how to quickly convert between timestamp arrays.

like image 33
Dale Jung Avatar answered Sep 21 '22 12:09

Dale Jung