Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding columns which contain dates in Pandas

Tags:

python

pandas

I'm trying to identify columns which contain dates as strings in order to then convert them to a better type (DateTime or something numeric like UTC). The date format used is 27/11/2012 09:17 which I can search for using a regex of \d{2}/\d{2}/\d{4} \d{2}:\d{2}.

My current code is:

date_cols = []
df = cleaned_data
date_pattern = re.compile('\d{2}/\d{2}/\d{4} \d{2}:\d{2}')
for column in df:
    if date_pattern.search(str(item)):
        date_cols += [column]
return date_cols

I'm sure this is not taking advantage of the capabilities of pandas. Is there a better way, either to identify the columns, or to convert them to DateTime or UTC timestamps directly?

like image 779
Jamie Bull Avatar asked Sep 13 '13 01:09

Jamie Bull


People also ask

How do you auto detect the date datetime columns and set their datatype when reading a csv file in pandas?

You should add parse_dates=True , or parse_dates=['column name'] when reading, thats usually enough to magically parse it.

What does .values do in pandas?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

Why can’t I read the date column in pandas?

As shown above, the problem is that the date column is read as an object type instead of a date type, which prevents it from accessing any date-related functionalities in Pandas. The easy solution is to ask Pandas to parse the date for us. As shown below, we specify a list object containing the date column name to the parse_dates parameter.

How to convert string to datetime in pandas?

Pandas provide us with a method called to_datetime () which converts the date and time in string format to a DateTime object. pd.date_range () method accepts a start date, an end date, and creates date sequences in that range.

How to determine if a pandas column contains a particular value?

You can see how we can determine a pandas column contains a particular value of DataFrame using Series.Str.contains (). This contains () function is used to test the pattern or regex is conta ined within a string of a Series or Index.

How do I parse multiple dates in pandas?

Please be noted that if you have multiple date columns, you can use parse_dates= [“date”, “another_date”]. It should be noted that Pandas integrates powerful date parsers such that many different kinds of dates can be parsed automatically. Thus, you usually just need to set the parse_date parameter.


2 Answers

If you are looking to convert entire columns, you can use convert_objects:

df.convert_objects(convert_dates=True)

To extract dates contained in columns/Series you could use findall:

In [11]: s = pd.Series(['1', '10/11/2011 11:11'])

In [12]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}')
Out[12]:
0                    []
1    [10/11/2011 11:11]
dtype: object

In [13]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}').apply(pd.Series)
Out[13]:
                  0
0               NaN
1  10/11/2011 11:11

*and then convert to Timestamps using convert_objects...*

like image 126
Andy Hayden Avatar answered Sep 30 '22 00:09

Andy Hayden


Depending on how overzealous you want to be, to_datetime will coerce anything it thinks is a datetime into a datetime, including ints → datetimes (defaults to ns since UNIX epoch).

to_datetime gives you a lot of control over how to interpret the datetimes it finds too.

pandas.to_datetime(arg, errors='ignore', dayfirst=False, utc=None,
                                 box=True, format=None, coerce=False, unit='ns')
like image 37
Kyle Kelley Avatar answered Sep 30 '22 01:09

Kyle Kelley