My raw data has a column with timestamps in ISO8601 format like this:
'2017-07-25T06:00:02+02:00'
Since the data is in CSV format, it will be read as object/string. Therefore I'm converting it to datetime like this.
import pandas pd
df['time'] = pd.to_datetime(df['time'], utc=False)
#df['time'][0]
df['time'][0].isoformat()
Unfortunately this results in UTC timestamps and the timezone is lost. For instance df['time'][0].tzinfo is not set.
Timestamp('2017-07-25 04:00:02')
'2017-07-25T04:00:02'
I'm looking for a way to keep the timezone info in each of the timezone objects. But without re-setting it to CEST (Central European Summer Time) afterwards since this information is already included in the ISO8601 timezone offset in the raw-data. Any idea how to do this?
Let’s start by simply converting a string column to date time. We can load the Pandas DataFrame below and print out its data types using the info () method: While the data looks like dates, it’s actually formatted as strings. Let’s see how we can use the Pandas to_datetime function to convert the string column to a date time.
Pandas was able to infer the datetime format and correctly convert the string to a datetime data type. In the next section, you’ll learn how to specify specific formats. There will be many times when you receive a date column in a format that’s not immediately inferred by Pandas.
Python | Pandas.to_datetime () When a csv file is imported and a Data Frame is made, the Date time objects in the file are read as a string object rather a Date Time object and Hence it’s very tough to perform operations like Time difference on a string rather a Date Time object. Pandas to_datetime () method helps to convert string Date time ...
How do I tell pandas to use 'IST' timezone or just 5hrs 30 mins further to the time it currently shows me. eg. 7 hrs should become 12:30 hrs and so on. Show activity on this post. You can use tz_localize to set the timezone to UTC /+0000, and then tz_convert to add the timezone you want:
So here's how I solved it.
There's a great article about Timezones and Python, which helped me to come to a solution. It relies on the ISO8601 Python packages.
import iso8601
times = ['2017-07-25 06:00:02+02:00',
'2017-07-25 08:15:08+02:00',
'2017-07-25 12:08:00+02:00',
'2017-07-25 13:10:12+02:00',
'2017-07-25 15:11:55+02:00',
'2017-07-25 16:00:00+02:00'
]
df = pd.DataFrame(times, columns=['time'])
df['time'] = df['time'].apply(iso8601.parse_date)
df['time'][0]
Which produces the following output and keeps the timezone information.
Timestamp('2017-07-25 06:00:02+0200', tz='+02:00')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With