I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.
This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I'm sure it can be done when creating the DataFrame
?
import sys if sys.version_info[0] < 3: from StringIO import StringIO else: from io import StringIO import pandas as pd data = 'RUN,UNIXTIME,VALUE\n1,1447160702320,10\n2,1447160702364,20\n3,1447160722364,42' df = pd.read_csv(StringIO(data)) convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3) converted_df = df['UNIXTIME'].apply(convert)
This will pick the column 'UNIXTIME' and change it from
0 1447160702320 1 1447160702364 2 1447160722364 Name: UNIXTIME, dtype: int64
into this
0 2015-11-10 14:05:02.320 1 2015-11-10 14:05:02.364 2 2015-11-10 14:05:22.364 Name: UNIXTIME, dtype: datetime64[ns]
However, I would like to use something like pd.apply()
to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.
Import the “datetime” file to start timestamp conversion into a date. Create an object and initialize the value of the timestamp. Use the ” fromtimestamp ()” method to place either data or object. Print the date after conversion of the timestamp.
Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.
In Pandas, you can convert a column (string/object or integer type) to datetime using the to_datetime() and astype() methods.
You can do this as a post processing step using to_datetime
and passing arg unit='ms'
:
In [5]: df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms') df Out[5]: RUN UNIXTIME VALUE 0 1 2015-11-10 13:05:02.320 10 1 2 2015-11-10 13:05:02.364 20 2 3 2015-11-10 13:05:22.364 42
I use the @EdChum solution, but I add the timezone management:
df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))\ .tz_localize('UTC' )\ .tz_convert('America/New_York')
the tz_localize
indicates that timestamp should be considered as regarding 'UTC', then the tz_convert
actually moves the date/time to the correct timezone (in this case `America/New_York').
Note that it has been converted to a DatetimeIndex
because the tz_
methods works only on the index of the series. Since Pandas 0.15 one can use .dt
:
df['UNIXTIME']=pd.to_datetime(df['UNIXTIME'], unit='ms')\ .dt.tz_localize('UTC' )\ .dt.tz_convert('America/New_York')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With