Using python pandas to parse CSV with date in format Year, Day, Hour, Min, Sec

Tags:

I have several CSV files with the format:

Year,Day,Hour,Min,Sec.,P1'S1
 2003,  1, 0, 0,12.22, 0.541
 2003,  1, 1, 0,20.69, 0.708
 2003,  1, 2, 0, 4.95, 0.520
 2003,  1, 3, 0,13.42, 0.539
...

(where day, is the day of the year) and I'm trying to read them using the pandas library (seems a fantastic lib so far).

There is a built-in function to read CSV in pandas, and even better, that function supposedly checks the columns for a date type. and automatically uses that as an index (which would be exactly perfect for what I'm doing).

The thing is, I cannot get it to work with date data in this format.

I tried:

data = pd.read_csv("csvFile.csv", index_col=[0, 1],  , index_col=[0, 1, 2, 3, 4] parse_dates=True)

but it only gets the year correctly:

In [36]: data.index
Out[36]: 
MultiIndex
[(<Timestamp: 2003-09-04 00:00:00>, 1, 0, 0, 12.22)
 (<Timestamp: 2003-09-04 00:00:00>, 1, 1, 0, 20.69)
 (<Timestamp: 2003-09-04 00:00:00>, 1, 2, 0, 4.95) ...,
 (<Timestamp: 2003-09-04 00:00:00>, 365, 21, 0, 3.77)
 (<Timestamp: 2003-09-04 00:00:00>, 365, 22, 0, 14.6)
 (<Timestamp: 2003-09-04 00:00:00>, 365, 23, 0, 13.36)]

From the documentation, I see that you can specify the "date_parser" attribute in the read_csv function of pandas. But the documentation doesn't show how and I'm not being able to figure it out. Anyone with experience in the subject that can give a hand.

Cheers, Bruno

595

asked Sep 04 '12 18:09

jbssm

1 Answers

In order to parse a multi-column date, you need to tell pandas which columns should be combined into a single date, so you need to say parse_dates=['Year','Day','Hour','Min','Sec']

You also need to define your own parser that takes a element from each column you specified in parse_dates:

In [1]: import pandas as pd

In [2]: from datetime import datetime, timedelta

In [3]: from cStringIO import StringIO

In [4]: data = """\
Year,Day,Hour,Min,Sec.,P1'S1
 2003,  1, 0, 0,12.22, 0.541
 2003,  1, 1, 0,20.69, 0.708
 2003,  1, 2, 0, 4.95, 0.520
 2003,  1, 3, 0,13.42, 0.539
"""

In [5]: def parse(yr, doy, hr, min, sec):
    yr, doy, hr, min = [int(x) for x in [yr, doy, hr, min]]
    sec = float(sec)
    mu_sec = int((sec - int(sec)) * 1e6)
    sec = int(sec)
    dt = datetime(yr - 1, 12, 31)
    delta = timedelta(days=doy, hours=hr, minutes=min, seconds=sec,
                      microseconds=mu_sec)
    return dt + delta
   ...: 

In [6]: pd.read_csv(StringIO(data), parse_dates={'datetime':      
           ['Year','Day','Hour','Min','Sec.']}, 
           date_parser=parse, index_col='datetime')
Out[6]: 
                            P1'S1
datetime                         
2003-01-01 00:00:12.220000  0.541
2003-01-01 01:00:20.690000  0.708
2003-01-01 02:00:04.950000  0.520
2003-01-01 03:00:13.419999  0.539

143

answered Oct 07 '22 03:10

Chang She

Related questions
                            
                                itertools.product() return value
                            
                                pass a string variable into a gql query
                            
                                Aggregating multiple test cases from multiple modules to run in PyDev TestRunner
                            
                                What is wrong with this simple python metaclass?
                            
                                Django celery retrieve task status using Ajax
                            
                                Can you apply an operation directly to arguments within map/reduce/filter?
                            
                                Use method other than __unicode__ in ModelChoiceField Django
                            
                                Cannot connect to a specific Ethernet port using Python and PyVisa
                            
                                Keep cmd.exe open
                            
                                Can API hooking in python be OS agnostic? [closed]
                            
                                Write multiple NumPy arrays into CSV file in separate columns?
                            
                                How to get all text between just two specified tags using BeautifulSoup?
                            
                                pandas dataframe - change values based on column heading
                            
                                Amazon Elastic MapReduce - SIGTERM
                            
                                Create a halo around text in Python using PIL?
                            
                                Using absolute_import and handling relative module name conflicts in python
                            
                                PyQt: updating GUI from a callback
                            
                                How to store objects in back-end sessions in Django just like Django stores User
                            
                                How to unit test factory in python?
                            
                                What's the pythonic way to wrap several functions in the same with statements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using python pandas to parse CSV with date in format Year, Day, Hour, Min, Sec

Tags:

python

pandas

python-dateutil

jbssm

People also ask

1 Answers

Chang She

Recent Activity

Donate For Us