I have imported data from excel file using pandas and cleaned it. But the problem I am facing right now is that all my columns have headings in the datetimeformat, which is like this, '2016-09-01 00:00:00', but I just need this format '2016-09-01'. These are the codes I am using:
import pandas
df7 = pandas.read_excel("abc.XLSX", sheetname = 0, header = 0, index_col=[0], skiprows= [0,1])
names = df7.columns.tolist()
So names contains all my column heading as a list. When I take one column at a time and use a function: strftime, then it works.
names[1].strftime('%Y-%m-%d')
I get the desied result which the format I want. But I want to run a loop so that all 31 columns cab be done in one code. As the first column heading is a string variable "time" so I am leaving column 0 when using a loop. I am using the following code:
for i in names[1:33]:
i.strftime('%Y-%m-%d')
print (i)
But then nothing is actually happening and the format is still the same. I also tried another way but it is also not working:
for i in names:
if i is type(datetime.datetime):
i.strftime('%Y-%M-%d')
else:
pass
It is also not working. I am not even getting any error. Any advices?
Note that strftime
does not function inplace
in itself. So, you should probably be doing i=i.strftime('%Y-%m-%d')
to see any effect when you print
the outputs.
Here's an effective way to accomplish the same process:
1) If all the columns are of type datetime64[ns]
as you've mentioned, you can use strftime
and remember to assign these changes back to the .columns
attribute:
df.columns = df.columns.strftime("%Y-%m-%d")
Also, could make use of .date
attribute which simply throws back it's date part and discards the time.
df.columns = df.columns.date
2) Incase of mixed dtypes
(combination of dtypes) present, you could then filter these columns on their type with the help of a custom function on map
method:
import datetime
mapper = lambda x: x.strftime("%Y-%m-%d") if isinstance(x, datetime.datetime) else x
df.columns = df.columns.map(mapper)
If you want to convert your columns dtype to string
(object
) - use @Nickil's solution.
If you want to keep columns in datetime64
dtype and strip time you can use normalize() method:
In [79]: df
Out[79]:
2017-01-01 08:00:00 2017-01-02 09:00:00
0 1 2
1 3 4
In [80]: df.columns.dtype
Out[80]: dtype('<M8[ns]')
In [81]: df.columns.normalize()
Out[81]: DatetimeIndex(['2017-01-01', '2017-01-02'], dtype='datetime64[ns]', freq=None)
In [82]: df.columns = df.columns.normalize()
In [83]: df
Out[83]:
2017-01-01 2017-01-02
0 1 2
1 3 4
In [84]: df.columns.dtype
Out[84]: dtype('<M8[ns]')
UPDATE: small demonstration of how .normalize()
method strips the time part:
In [91]: x = pd.DataFrame({'ts':pd.date_range('2017-01-01', freq='999999S', periods=5)})
In [92]: x
Out[92]:
ts
0 2017-01-01 00:00:00
1 2017-01-12 13:46:39
2 2017-01-24 03:33:18
3 2017-02-04 17:19:57
4 2017-02-16 07:06:36
In [93]: x.ts.dt.normalize()
Out[93]:
0 2017-01-01
1 2017-01-12
2 2017-01-24
3 2017-02-04
4 2017-02-16
Name: ts, dtype: datetime64[ns]
Setup:
"2017-01-01 08:00:00" "2017-01-02 09:00:00"
1 2
3 4
df = pd.read_clipboard()
df.columns = pd.to_datetime(df.columns)
In [114]: df.columns.dtype
Out[114]: dtype('<M8[ns]')
In [115]: df
Out[115]:
2017-01-01 08:00:00 2017-01-02 09:00:00
0 1 2
1 3 4
In [116]: df.columns = df.columns.normalize()
In [117]: df
Out[117]:
2017-01-01 2017-01-02
0 1 2
1 3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With