Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas converting date with string in

I'm starting with python and pandas and matplotlib. I'm working with data with over million entries. I'm trying to change the date format. In CSV file date format is 23-JUN-11. I will like to use dates in future to plot amount of donation for each candidate. How to convert the date format to a readable format for pandas?

Here is the link to cut file 149 entries

My code:

%matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

First candidate

reader_bachmann = pd.read_csv('P00000001-ALL.csv' ,converters={'cand_id': lambda x: str(x)[1:]},parse_dates=True, squeeze=True, low_memory=False, nrows=411 )

date_frame = pd.DataFrame(reader_bachmann, columns = ['contb_receipt_dt'])

Data slice

  s = date_frame.iloc[:,0]
    date_slice = pd.Series([s])
    date_strip = date_slice.str.replace('JUN','6') 

Trying to convert to new date format

 date = pd.to_datetime(s, format='%d%b%Y')
    print(date_slice)

Here is the error message

ValueError: could not convert string to float: '05-JUL-11'
like image 380
pooh098 Avatar asked Jan 05 '23 03:01

pooh098


2 Answers

You need to use a different date format string:

format='%d-%b-%y'

Why?

The error message gives a clue as to what is wrong:

ValueError: could not convert string to float: '05-JUL-11'

The format string controls the conversion, and is currently:

format='%d%b%Y'

And the fields needed are:

%y - year without a century (range 00 to 99)
%b - abbreviated month name
%d - day of the month (01 to 31)

What is missing is the - that are separating the field in your data string, and the y for a two digit year instead of the current Y for a four digit year.

like image 128
Stephen Rauch Avatar answered Jan 13 '23 23:01

Stephen Rauch


As an alternative you can use dateutil.parser to parse dates containing string directly, I have created a random dataframe for demo.

l = [] 
for i in range(100):
    l.append('23-JUN-11') 
B = pd.DataFrame({'Date':l})

Now, Let's import dateutil.parser and apply it on our date column

import dateutil.parser
B['Date2'] = B['Date'].apply(lambda x : dateutil.parser.parse(x))
B.head()
Out[106]: 
    Date      Date2
0  23-JUN-11 2011-06-23
1  23-JUN-11 2011-06-23
2  23-JUN-11 2011-06-23
3  23-JUN-11 2011-06-23
4  23-JUN-11 2011-06-23
like image 27
Mohammad Akhtar Avatar answered Jan 13 '23 22:01

Mohammad Akhtar