I wrote a code that reads multiple files, however on some of my files datetime swaps day & month whenever the day is less than 13, and any day that is from day 13 or above i.e. 13/06/11 remains correct (DD/MM/YY). I tried to fix it by doing this,but it doesn't work.
My data frame looks like this: The actual datetime is from 12june2015 to 13june2015 when my I read my datetime column as a string the dates remain correct dd/mm/yyyy
tmp p1 p2
11/06/2015 00:56:55.060 0 1
11/06/2015 04:16:38.060 0 1
12/06/2015 16:13:30.060 0 1
12/06/2015 21:24:03.060 0 1
13/06/2015 02:31:44.060 0 1
13/06/2015 02:37:49.060 0 1
but when I change the type of my column to datetime column it swaps my day and month for each day that is less than 13.
output:
print(df)
tmp p1 p2
06/11/2015 00:56:55 0 1
06/11/2015 04:16:38 0 1
06/12/2015 16:13:30 0 1
06/12/2015 21:24:03 0 1
13/06/2015 02:31:44 0 1
13/06/2015 02:37:49 0 1
Here is my code :
I loop through files :
df = pd.read_csv(PATH+file, header = None,error_bad_lines=False , sep = '\t')
then when my code finish reading all my files I concatenat them, the problem is that my datetime column needs to be in a datetime type so when I change its type by pd_datetime() it swaps the day and month when the day is less than 13.
Post converting my datetime column the dates are correct (string type)
print(tmp) # as a result I get 11.06.2015 12:56:05 (11june2015)
But when I change the column type I get this:
tmp = pd.to_datetime(tmp, unit = "ns")
tmp = temps_absolu.apply(lambda x: x.replace(microsecond=0))
print(tmp) # I get 06-11-2016 12:56:05 (06november2015 its not the right date)
The question is : What command should i use or change in order to stop day and month swapping when the day is less than 13?
UPDATE This command swaps all the days and months of my column
tmp = pd.to_datetime(tmp, unit='s').dt.strftime('%#m/%#d/%Y %H:%M:%S')
So in order to swap only the incorrect dates, I wrote a condition:
for t in tmp:
if (t.day < 13):
t = datetime(year=t.year, month=t.day, day=t.month, hour=t.hour, minute=t.minute, second = t.second)
But it doesn't work either
You can convert DateTime to Date in pandas by using dt. date and dt. normalize() methods. In Pandas, DateTime is a collection of date and time in the format of “YYYY-MM-DD HH:MM:SS” where YYYY-MM-DD is referred to as the date and HH:MM:SS is referred to as Time.
Use pandas. to_datetime() to change String to “yyyymmdd” Format. If You have a date in "yymmdd" format in the DataFrame column, and to change it from a string to a date ('yyyy-mm-dd') format.
You can use the dayfirst
parameter in pd.to_datetime
.
pd.to_datetime(df.tmp, dayfirst=True)
Output:
0 2015-06-11 00:56:55
1 2015-06-11 04:16:38
2 2015-06-12 16:13:30
3 2015-06-12 21:24:03
4 2015-06-13 02:31:44
5 2015-06-13 02:37:49
Name: tmp, dtype: datetime64[ns]
Well I solved my problem but in a memory consuming method, I split my tmp column first to a date and time columns then I re-split my date column to day month and year, that way I could look for the days that are less than 13 and replace them with the correspondent month
df['tmp'] = pd.to_datetime(df['tmp'], unit='ns')
df['tmp'] = df['tmp'].apply(lambda x: x.replace(microsecond=0))
df['date'] = [d.date() for d in df['tmp']]
df['time'] = [d.time() for d in df['tmp']]
df[['year','month','day']] = df['date'].apply(lambda x: pd.Series(x.strftime("%Y-%m-%d").split("-")))
df['day'] = pd.to_numeric(df['day'], errors='coerce')
df['month'] = pd.to_numeric(df['month'], errors='coerce')
df['year'] = pd.to_numeric(df['year'], errors='coerce')
#Loop to look for days less than 13 and then swap the day and month
for index, d in enumerate(df['day']):
if(d <13):
df.loc[index,'day'],df.loc[index,'month']=df.loc[index,'month'],df.loc[index,'day']
# convert series to string type in order to merge them
df['day'] = df['day'].astype(str)
df['month'] = df['month'].astype(str)
df['year'] = df['year'].astype(str)
df['date']= pd.to_datetime(df[['year', 'month', 'day']])
df['date'] = df['date'].astype(str)
df['time'] = df['time'].astype(str)
# merge time and date and place result in our column
df['tmp'] =pd.to_datetime(df['date']+ ' '+df['time'])
# drop the added columns
df.drop(df[['date','year', 'month', 'day','time']], axis=1, inplace = True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With