I'm very new to python
and pandas
. Any guidance, comment, and suggestion are appreciated!
Here is my issue: it takes couple minutes to return the result after I call df.shape
or df.dtypes
. The DataFrame
has 1,610,658 rows and 5 columns. Three columns are stored as int64
, one as float64
, and one as datetime64
.
I used the following codes to practice load & transform in python
. Both load and transform have good performance, but I met this issue when I checked the output.
Update 1:
After setting some columns as index, the df.shape
time drops from 80+s down to 1.7s, but the df.dtypes
still stay at 80+s
import pandas as pd
###############
# Load
###############
raw = pd.read_csv("data.zip", compression='zip')
###############
# Transform
###############
payment_method = {
"Cash": 1
"Card": 2
}
df = raw. \
assign(
# Encode site ids to int. Only two sites in this data
site = (raw.site == "A").astype(int),
# Encode payment types to int
payment =
[payment_method.get(k, 0) for k in raw.payment],
# Rescale values
amount = raw.amount / 1e6,
# Convert integer date key to datetime
sold_date= pd.to_datetime(
[str(dt) for dt in raw. sold_date],
format="%Y%m%d")
)
###############
# Check point
###############
df.shape # pain point I. Took minutes to return
# Out[9]: (1610658, 5)
df.dtypes # pain point II
# Out[10]:
# site int64
# acct_key int64
# sold_date datetime64[ns]
# amount float64
# payment int64
If I convert the data frame to numpy.ndarray
, I can instantly get the result. I think I must miss something. Please give me some direction.
Thanks a lot!
System: OS X 10.12
Python: 3.6.1
Numpy: 1.12
Pandas: 0.20.2
Jupyter console: 5.1.0
Try to reduce the size of your DataFrame:
int_columns = df.select_dtypes(include=["int"]).columns
df[int_columns] = df[int_columns].apply(pd.to_numeric, downcast='unsigned')
float_columns = df.select_dtypes(include=["float"]).columns
df[float_columns] = df[float_columns].apply(pd.to_numeric, downcast='float')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With