I'm trying to identify columns which contain dates as strings in order to then convert them to a better type (DateTime or something numeric like UTC). The date format used is <code>27/11/2012 09:17</code> which I can search for using a regex of <code>\d{2}/\d{2}/\d{4} \d{2}:\d{2}</code>. My current code is: <pre class="prettyprint"><code>date_cols = [] df = cleaned_data date_pattern = re.compile('\d{2}/\d{2}/\d{4} \d{2}:\d{2}') for column in df: if date_pattern.search(str(item)): date_cols += [column] return date_cols </code></pre> I'm sure this is not taking advantage of the capabilities of <code>pandas</code>. Is there a better way, either to identify the columns, or to convert them to DateTime or UTC timestamps directly?

If you are looking to convert entire columns, you can use convert_objects: <pre class="prettyprint"><code>df.convert_objects(convert_dates=True) </code></pre> To extract dates contained in columns/Series you could use findall: <pre class="prettyprint"><code>In [11]: s = pd.Series(['1', '10/11/2011 11:11']) In [12]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}') Out[12]: 0 [] 1 [10/11/2011 11:11] dtype: object In [13]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}').apply(pd.Series) Out[13]: 0 0 NaN 1 10/11/2011 11:11 </code></pre> *and then convert to Timestamps using convert_objects...*

Depending on how overzealous you want to be, <code>to_datetime</code> will coerce anything it thinks is a datetime into a datetime, including ints → datetimes (defaults to ns since UNIX epoch). <code>to_datetime</code> gives you a lot of control over how to interpret the datetimes it finds too. <pre class="prettyprint"><code>pandas.to_datetime(arg, errors='ignore', dayfirst=False, utc=None, box=True, format=None, coerce=False, unit='ns') </code></pre>

Finding columns which contain dates in Pandas

Tags:

python

pandas

I'm trying to identify columns which contain dates as strings in order to then convert them to a better type (DateTime or something numeric like UTC). The date format used is 27/11/2012 09:17 which I can search for using a regex of \d{2}/\d{2}/\d{4} \d{2}:\d{2}.

My current code is:

Click to copy

date_cols = []
df = cleaned_data
date_pattern = re.compile('\d{2}/\d{2}/\d{4} \d{2}:\d{2}')
for column in df:
    if date_pattern.search(str(item)):
        date_cols += [column]
return date_cols

I'm sure this is not taking advantage of the capabilities of pandas. Is there a better way, either to identify the columns, or to convert them to DateTime or UTC timestamps directly?

779

asked Sep 13 '13 01:09

Jamie Bull

2 Answers

If you are looking to convert entire columns, you can use convert_objects:

Click to copy

df.convert_objects(convert_dates=True)

To extract dates contained in columns/Series you could use findall:

Click to copy

In [11]: s = pd.Series(['1', '10/11/2011 11:11'])

In [12]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}')
Out[12]:
0                    []
1    [10/11/2011 11:11]
dtype: object

In [13]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}').apply(pd.Series)
Out[13]:
                  0
0               NaN
1  10/11/2011 11:11

*and then convert to Timestamps using convert_objects...*

126

answered Sep 30 '22 00:09

Andy Hayden

Depending on how overzealous you want to be, to_datetime will coerce anything it thinks is a datetime into a datetime, including ints → datetimes (defaults to ns since UNIX epoch).

to_datetime gives you a lot of control over how to interpret the datetimes it finds too.

Click to copy

pandas.to_datetime(arg, errors='ignore', dayfirst=False, utc=None,
                                 box=True, format=None, coerce=False, unit='ns')

answered Sep 30 '22 01:09

Kyle Kelley

Related questions
                            
                                Is split() of a static string a run-time or compile-time operation?
                            
                                Python object oriented design concepts
                            
                                How to find multiline text between curly braces?
                            
                                how to get the integer value of a single pyserial byte in python
                            
                                How to limit number of concurrent threads in Python?
                            
                                how to create similarity matrix in numpy python?
                            
                                Finding the first commit on a branch with GitPython
                            
                                python urllib2 basic authentication
                            
                                Find if a number exists between a range of numbers specified by a list
                            
                                unnecessary exclamation marks(!)'s in HTML code
                            
                                print: "IOError: [Errno 9] Bad file descriptor"
                            
                                Matplotlib Pie Chart Labels Alignment
                            
                                Getting html tag value in python
                            
                                what's meaning of orphans in django's paginator?
                            
                                Autocomplete for OpenCV-Python in Windows not working
                            
                                Multivariate series expansion in sympy
                            
                                How to monkeypatch one class's instance method to another one?
                            
                                In python, how can I distinguish between a human readable word and a random string?
                            
                                how to update existing data frame in pandas?
                            
                                pandas retaining index column when using usecols

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding columns which contain dates in Pandas

Tags:

python

pandas

Jamie Bull

People also ask

2 Answers

Andy Hayden

Kyle Kelley

Recent Activity

Donate For Us