Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading data from csv into pandas when date and time are in separate columns

I looked at the answer to this question: Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python, but it doesn't seem to work for me, which makes me think I'm doing something subtley wrong.

I've got data in .csv files, which I'm trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to merge in them into one column, "Datetime", containing datetime objects. The csv looks like this:

    Note about the data
    blank line
    Site Id,Date,Time,WTEQ.I-1...
    2069, 2008-01-19, 06:00, -99.9...
    2069, 2008-01-19, 07:00, -99.9...
    ...

I'm trying to read it using this line of code:

   read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])

However, when I write it back out to a csv, it looks exactly the same (except that the -99.9s are changed to NA, like I specified with the na_values argument). Date and time are in two separate columns. As I understand it, this should be creating a new column called Datetime that is composed of columns 1 and 2, parsed using the date_parser. I have also tried using parse_dates={"Datetime" : ["Date","Time"]}, parse_dates=[[1,2]], and parse_dates=[["Date", "Time"]]. I have also tried using date_parser=parse, where parse is defined as:

    parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')

None of these has made the least bit of difference, which makes me suspect that there's some deeper problem. Any insight into what it might be?

like image 594
seaotternerd Avatar asked Jul 05 '13 16:07

seaotternerd


People also ask

How do I separate a date and time in a CSV file in Python?

Select Text to Columns and choose Space for the Separated By field. By default, the Tab option will be enabled for the Separated By field, so you'll need to uncheck that after choosing Space. Choose the Collection Time column and then select Date (MDY) from the Column type drop-down. Once you're done, click OK.

How do I work with dates and times in pandas?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.

How do you combine date and time columns in Python?

Pandas Combine() Function combine() function which allows us to take a date and time string values and combine them to a single Pandas timestamp object. The function accepts two main parameters: Date – refers to the datetime.


1 Answers

You should update your pandas, I recommend the latest stable version for the latest features and bug fixes.

This specific feature was introduced in 0.8.0, and works on pandas version 0.11:

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"])
Out[11]:
             Datetime  Site Id  WTEQ.I-1
0 2008-01-19 06:00:00     2069       NaN
1 2008-01-19 07:00:00     2069       NaN

without the date_parser=True (since this should be a parsing function, see docstring).

Note that in the provided example the resulting "Datetime" column is a Series of its own and not the index values of the DataFrame. If you'd rather want to have the datetime values as index column rather than the integer value pass the index_col argument specifying the desired column, in this case 0 since the resulting "Datetime" column is the first one.

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])
like image 70
Andy Hayden Avatar answered Oct 19 '22 18:10

Andy Hayden