Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas : pandas.to_datetime() is switching day & month when day is less than 13

I wrote a code that reads multiple files, however on some of my files datetime swaps day & month whenever the day is less than 13, and any day that is from day 13 or above i.e. 13/06/11 remains correct (DD/MM/YY). I tried to fix it by doing this,but it doesn't work.

My data frame looks like this: The actual datetime is from 12june2015 to 13june2015 when my I read my datetime column as a string the dates remain correct dd/mm/yyyy

tmp                     p1 p2 
11/06/2015 00:56:55.060  0  1
11/06/2015 04:16:38.060  0  1
12/06/2015 16:13:30.060  0  1
12/06/2015 21:24:03.060  0  1
13/06/2015 02:31:44.060  0  1
13/06/2015 02:37:49.060  0  1

but when I change the type of my column to datetime column it swaps my day and month for each day that is less than 13.

output:

print(df)
tmp                  p1 p2 
06/11/2015 00:56:55  0  1
06/11/2015 04:16:38  0  1
06/12/2015 16:13:30  0  1
06/12/2015 21:24:03  0  1
13/06/2015 02:31:44  0  1
13/06/2015 02:37:49  0  1

Here is my code :

I loop through files :

df = pd.read_csv(PATH+file, header = None,error_bad_lines=False , sep = '\t')

then when my code finish reading all my files I concatenat them, the problem is that my datetime column needs to be in a datetime type so when I change its type by pd_datetime() it swaps the day and month when the day is less than 13.

Post converting my datetime column the dates are correct (string type)

print(tmp) # as a result I get 11.06.2015 12:56:05 (11june2015)

But when I change the column type I get this:

tmp = pd.to_datetime(tmp, unit = "ns")
tmp = temps_absolu.apply(lambda x: x.replace(microsecond=0))
print(tmp) # I get 06-11-2016 12:56:05 (06november2015 its not the right date)

The question is : What command should i use or change in order to stop day and month swapping when the day is less than 13?

UPDATE This command swaps all the days and months of my column

tmp =  pd.to_datetime(tmp, unit='s').dt.strftime('%#m/%#d/%Y %H:%M:%S') 

So in order to swap only the incorrect dates, I wrote a condition:

for t in tmp:
        if (t.day < 13):
            t = datetime(year=t.year, month=t.day, day=t.month, hour=t.hour, minute=t.minute, second = t.second)

But it doesn't work either

like image 202
Oumab10 Avatar asked May 16 '18 09:05

Oumab10


People also ask

How can I change date in pandas?

You can convert DateTime to Date in pandas by using dt. date and dt. normalize() methods. In Pandas, DateTime is a collection of date and time in the format of “YYYY-MM-DD HH:MM:SS” where YYYY-MM-DD is referred to as the date and HH:MM:SS is referred to as Time.

How do I convert a string to a date in pandas?

Use pandas. to_datetime() to change String to “yyyymmdd” Format. If You have a date in "yymmdd" format in the DataFrame column, and to change it from a string to a date ('yyyy-mm-dd') format.


2 Answers

You can use the dayfirst parameter in pd.to_datetime.

pd.to_datetime(df.tmp, dayfirst=True)

Output:

0   2015-06-11 00:56:55
1   2015-06-11 04:16:38
2   2015-06-12 16:13:30
3   2015-06-12 21:24:03
4   2015-06-13 02:31:44
5   2015-06-13 02:37:49
Name: tmp, dtype: datetime64[ns]
like image 163
Scott Boston Avatar answered Sep 17 '22 08:09

Scott Boston


Well I solved my problem but in a memory consuming method, I split my tmp column first to a date and time columns then I re-split my date column to day month and year, that way I could look for the days that are less than 13 and replace them with the correspondent month

df['tmp'] = pd.to_datetime(df['tmp'], unit='ns')
df['tmp'] = df['tmp'].apply(lambda x: x.replace(microsecond=0))
df['date'] = [d.date() for d in df['tmp']]
df['time'] = [d.time() for d in df['tmp']]
df[['year','month','day']] = df['date'].apply(lambda x: pd.Series(x.strftime("%Y-%m-%d").split("-")))

df['day'] = pd.to_numeric(df['day'], errors='coerce')
df['month'] = pd.to_numeric(df['month'], errors='coerce')
df['year'] = pd.to_numeric(df['year'], errors='coerce')


#Loop to look for days less than 13 and then swap the day and month
for index, d in enumerate(df['day']):
        if(d <13): 
 df.loc[index,'day'],df.loc[index,'month']=df.loc[index,'month'],df.loc[index,'day'] 

# convert series to string type in order to merge them

 df['day'] = df['day'].astype(str)
 df['month'] = df['month'].astype(str)
 df['year'] = df['year'].astype(str)
 df['date']=  pd.to_datetime(df[['year', 'month', 'day']])
 df['date'] = df['date'].astype(str)
 df['time'] = df['time'].astype(str)

# merge time and date and place result in our column

df['tmp'] =pd.to_datetime(df['date']+ ' '+df['time'])

# drop the added columns

df.drop(df[['date','year', 'month', 'day','time']], axis=1, inplace = True)
like image 42
Oumab10 Avatar answered Sep 21 '22 08:09

Oumab10