Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip time from an object date in pandas

Tags:

python

pandas

I am having trouble with some dates from zipped xlsx files. These files are loaded into a sqlite database then exported as .csv. Each file is about 40,000 rows per day. The issue I run into is that pd.to_datetime does not seem to work on these objects (dates from Excel format is causing the issue I think - pure .csv files work fine with this command). This is fine actually - I do not need them to be in datetime format.

What I am trying to achieve is creating a column called ShortDate which is %m/%d/%Y. How can I do this on a datetime object (format is mm/dd/yyyy hh:mm:ss from Excel). I will then create a new column called RosterID which combines the EmployeeID field and the ShortDate field together into a unique ID.

I am very new to pandas and I am currently only using it to process .csv files (rename and select certain columns, create unique IDs to use in filters in Tableau, etc).

rep = pd.read_csv(r'C:\Users\Desktop\test.csv.gz', dtype = 'str', compression = 'gzip', usecols = ['etc','etc2'])
print('Read successfully.')
rep['Total']=1
rep['UniqueID']= rep['EmployeeID'] + rep['InteractionID']
rep['ShortDate'] = ??? #what do I do here to get what I am looking for?
rep['RosterID']= rep['EmployeeID'] + rep['ShortDate'] # this is my goal
print('Modified successfully.')

Here is some of the raw data from the .csv. Column names would be

InteractionID, Created Date, EmployeeID, Repeat Date
07927,04/01/2014 14:05:10,912a,04/01/2014 14:50:03
02158,04/01/2014 13:44:05,172r,04/04/2014 17:47:29
44279,04/01/2014 17:28:36,217y,04/07/2014 22:06:19
like image 782
trench Avatar asked Oct 15 '14 16:10

trench


People also ask

How to remove time from a pandas date?

Note that if the date is not a pandas datetime date, you need to first covert it using pd.to_datetime () before you can use the dt.date attribute. Let’s look at some examples of using the above syntax. 1. Remove time from a pandas date Let’s first look at how to time from a pandas datetime object. For this, apply the .date () function.

What is the difference between pandas timestamp and DateTime?

The pandas library provides a DateTime object with nanosecond precision called Timestamp to work with date and time values. The Timestamp object derives from the NumPy’s datetime64 data type, making it more accurate and significantly faster than Python’s DateTime object.

What is the difference between pandas DT year and DT day?

pandas.Series.dt.year returns the year of the date time. pandas.Series.dt.month returns the month of the date time. pandas.Series.dt.day returns the day of the date time. pandas.Series.dt.hour returns the hour of the date time.

How to remove the time from the datetime string?

How can i write the code to remove the time from the datetime. Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:


2 Answers

Create a new column, then just apply simple datetime functions using lambda and apply.

In [14]: df['Short Date']= pd.to_datetime(df['Created Date'])

In [15]: df
Out[15]: 
   InteractionID    Created Date EmployeeID     Repeat Date  \
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06   

           Short Date  
0 2014-04-01 14:05:00  
1 2014-04-01 13:44:00  
2 2014-04-01 17:28:00  

In [16]: df['Short Date'] = df['Short Date'].apply(lambda x:x.date().strftime('%m%d%y'))

In [17]: df
Out[17]: 
   InteractionID    Created Date EmployeeID     Repeat Date Short Date  
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50     040114   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47     040114   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06     040114

Then just concatenate the two columns. Convert the Short Date column to strings to avoid errors on concatenation of strings and integers.

In [32]: df['Roster ID'] = df['EmployeeID'] + df['Short Date'].map(str)

In [33]: df
Out[33]: 
   InteractionID    Created Date EmployeeID     Repeat Date Short Date  \
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50     040114   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47     040114   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06     040114   

    Roster ID  
0  912a040114  
1  172r040114  
2  217y040114 
like image 20
NullDev Avatar answered Sep 22 '22 22:09

NullDev


You can apply a post-processing step that first converts the string to a datetime and then applies a lambda to keep just the date portion:

In [29]:

df['Created Date'] = pd.to_datetime(df['Created Date']).apply(lambda x: x.date())
df['Repeat Date'] = pd.to_datetime(df['Repeat Date']).apply(lambda x: x.date())
df


Out[29]:
   InteractionID Created Date EmployeeID Repeat Date
0           7927   2014-04-01       912a  2014-04-01
1           2158   2014-04-01       172r  2014-04-04
2          44279   2014-04-01       217y  2014-04-07

EDIT

After looking at this again, you can access just the date component using dt.date if your version of pandas is greater than 0.15.0:

In [18]:
df['just_date'] = df['Repeat Date'].dt.date
df

Out[18]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date  
0  2014-04-01  
1  2014-04-04  
2  2014-04-07  

Additionally you can also do dt.strftime now rather than use apply to achieve the result you want:

In [28]:
df['short_date'] = df['Repeat Date'].dt.strftime('%m%d%Y')
df

Out[28]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date short_date  
0  2014-04-01   04012014  
1  2014-04-04   04042014  
2  2014-04-07   04072014  

So generating the Roster Id's is now a trivial exercise of adding the 2 new columns:

In [30]:
df['Roster ID'] = df['EmployeeID'] + df['short_date']
df

Out[30]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date short_date     Roster ID  
0  2014-04-01   04012014  912a04012014  
1  2014-04-04   04042014  172r04042014  
2  2014-04-07   04072014  217y04072014  
like image 179
EdChum Avatar answered Sep 23 '22 22:09

EdChum