I have the following raw data,
TranID,TranDate,TranTime,TranAmt
A123456,20160427,02:18,9999.53
B123457,20160426,02:48,26070.33
C123458,20160425,03:18,13779.56
A123459,20160424,03:18,18157.26
B123460,20160423,04:18,215868.15
C123461,20160422,04:18,23695.25
A123462,20160421,05:18,57
B123463,20160420,05:18,64594.24
C123464,20160419,06:18,47890.91
A123465,20160427,06:18,14119.74
B123466,20160426,07:18,2649.6
C123467,20160425,07:18,16757.38
A123468,20160424,08:18,8864.78
B123469,20160423,08:18,26254.69
C123470,20160422,09:18,13206.98
A123471,20160421,09:18,15872.45
B123472,20160420,10:18,197621.18
C123473,20160419,10:18,21048.72
and I tried importing the raw data using pd read_csv,
Try1
import numpy as np
import pandas as pd
df = pd.read_csv('MyTest.csv', sep=',', header=0, parse_dates=['TranDate'],
usecols=['TranID','TranDate','TranTime','TranAmt'],
engine='python')
print(df.dtypes)
df[:5]
Output1
TranID object
TranDate datetime64[ns]
TranTime object
TranAmt float64
dtype: object
Out[12]:
TranID TranDate TranTime TranAmt
0 A123456 2016-04-27 02:18 9999.53
1 B123457 2016-04-26 02:48 26070.33
2 C123458 2016-04-25 03:18 13779.56
3 A123459 2016-04-24 03:18 18157.26
4 B123460 2016-04-23 04:18 215868.15
Try2
import numpy as np
import pandas as pd
df = pd.read_csv('MyTest.csv', sep=',', header=0, parse_dates=['TranDate', 'TranTime'],
usecols=['TranID','TranDate','TranTime','TranAmt'],
engine='python')
print(df.dtypes)
df[:5]
Output2
TranID object
TranDate datetime64[ns]
TranTime datetime64[ns]
TranAmt float64
dtype: object
Out[13]:
TranID TranDate TranTime TranAmt
0 A123456 2016-04-27 2016-04-27 02:18:00 9999.53
1 B123457 2016-04-26 2016-04-27 02:48:00 26070.33
2 C123458 2016-04-25 2016-04-27 03:18:00 13779.56
3 A123459 2016-04-24 2016-04-27 03:18:00 18157.26
4 B123460 2016-04-23 2016-04-27 04:18:00 215868.15
My confusions are with the TranTime column. In Try1, it is displayed correctly but dtype is object. In Try2, pd added current date to the time and the dtype is datetime.
I want this TranTime column to be treated as Time and want to perform aggregations using pd's groupby or pivot_table. If I use Try1 method, does the object dtype affect my aggregations? If I use Try2 method, do I need to strip the Date part out in order to use the Time part?
I am proficient in SAS where SAS has date, time and datetime informats and formats where the underlying data type is just numeric. Hence my confusion with Python's object and datetime dtypes.
Thanks, Lobbie
In Python, datetimes are generally represented as datetime.datetime objects. These are not very efficient, which is why Pandas uses Timestamps, which are numeric.
To read the data (note the double brackets around the parse_dates
arguments):
df = pd.read_csv(filename, parse_dates=[['TranDate', 'TranTime']])
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18 entries, 0 to 17
Data columns (total 3 columns):
TranDate_TranTime 18 non-null datetime64[ns]
TranID 18 non-null object
TranAmt 18 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
>>> df.head()
TranDate_TranTime TranID TranAmt
0 2016-04-27 02:18:00 A123456 9999.53
1 2016-04-26 02:48:00 B123457 26070.33
2 2016-04-25 03:18:00 C123458 13779.56
3 2016-04-24 03:18:00 A123459 18157.26
4 2016-04-23 04:18:00 B123460 215868.15
The date and time columns have been joined to just one columns. Once you have this timestamp, it is easy to access its attributes using the dt
accessor, e.g.
>>> df.groupby(df.TranDate_TranTime.dt.hour).TranAmt.sum().head()
TranDate_TranTime
2 36069.86
3 31936.82
4 239563.40
5 64651.24
6 62010.65
Name: TranAmt, dtype: float64
>>> df.groupby(df.TranDate_TranTime.dt.day).TranAmt.sum().head()
TranDate_TranTime
19 68939.63
20 262215.42
21 15929.45
22 36902.23
23 242122.84
Name: TranAmt, dtype: float64
Refer to the Pandas docs for more information regarding Pandas date functionality.
- No aggregation affection, but you will lose the time part.
- No, mostly you can access the time part by
.dt
accessor.
import pandas as pd
df = pd.read_csv('MyTest.csv', parse_dates=[['TranDate', 'TranTime']])
print df
TranDate_TranTime TranID TranAmt
0 2016-04-27 02:18:00 A123456 9999.53
1 2016-04-26 02:48:00 B123457 26070.33
2 2016-04-25 03:18:00 C123458 13779.56
3 2016-04-24 03:18:00 A123459 18157.26
4 2016-04-23 04:18:00 B123460 215868.15
5 2016-04-22 04:18:00 C123461 23695.25
6 2016-04-21 05:18:00 A123462 57.00
7 2016-04-20 05:18:00 B123463 64594.24
8 2016-04-19 06:18:00 C123464 47890.91
9 2016-04-27 06:18:00 A123465 14119.74
10 2016-04-26 07:18:00 B123466 2649.60
11 2016-04-25 07:18:00 C123467 16757.38
12 2016-04-24 08:18:00 A123468 8864.78
13 2016-04-23 08:18:00 B123469 26254.69
14 2016-04-22 09:18:00 C123470 13206.98
15 2016-04-21 09:18:00 A123471 15872.45
16 2016-04-20 10:18:00 B123472 197621.18
17 2016-04-19 10:18:00 C123473 21048.72
Parse and manage the date/time as one column as far as possible using nested bracket parse_dates=[[]]
.
print df.groupby(df.TranDate_TranTime.dt.hour).sum()
TranAmt
2 36069.86
3 31936.82
4 239563.40
5 64651.24
6 62010.65
7 19406.98
8 35119.47
9 29079.43
10 218669.90
print df.groupby(df.TranDate_TranTime.dt.minute).sum()
TranAmt
18 710437.42
48 26070.33
Get what you want like above.
And you can still groupby after resampling like below.
df2 = df.set_index('TranDate_TranTime').resample('60s').sum().dropna()
print df2
TranAmt
TranDate_TranTime
2016-04-19 06:18:00 47890.91
2016-04-19 10:18:00 21048.72
2016-04-20 05:18:00 64594.24
2016-04-20 10:18:00 197621.18
2016-04-21 05:18:00 57.00
2016-04-21 09:18:00 15872.45
2016-04-22 04:18:00 23695.25
2016-04-22 09:18:00 13206.98
2016-04-23 04:18:00 215868.15
2016-04-23 08:18:00 26254.69
2016-04-24 03:18:00 18157.26
2016-04-24 08:18:00 8864.78
2016-04-25 03:18:00 13779.56
2016-04-25 07:18:00 16757.38
2016-04-26 02:48:00 26070.33
2016-04-26 07:18:00 2649.60
2016-04-27 02:18:00 9999.53
2016-04-27 06:18:00 14119.74
print df2.groupby(df2.index.day).sum()
TranAmt
19 68939.63
20 262215.42
21 15929.45
22 36902.23
23 242122.84
24 27022.04
25 30536.94
26 28719.93
27 24119.27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With