How to profile large datasets with Pandas profiling?

Question

Data is not perfectly clean, but is used without issue with pandas. The pandas library provides many extremely useful functions for EDA.

But when I use profiling for large data i.e 100 million records with 10 columns, reading it from a database table, it does not complete and my laptop runs out of memory, the size of data in csv is around 6 gb and my RAM is 14 GB my idle usage is around 3 - 4 GB approximately.

df = pd.read_sql_query("select * from table", conn_params)
profile = pandas.profiling.ProfileReport(df)
profile.to_file(outputfile="myoutput.html")

I have also tried with check_recoded = False option as well. But it does not help in profiling entirely. Is there any way to chunk and read the data and finally generate the summary report as a whole? OR any other method to use this function with large dataset.

Giorgos Myrianthous · Accepted Answer

v2.4 introduced the minimal mode that disables expensive computations (such as correlations and dynamic binning):

from pandas_profiling import ProfileReport


profile = ProfileReport(df, minimal=True)
profile.to_file(output_file="output.html")

cptnJ · Answer

The syntax to disable the calculation of correlations (thereby heavily reducing calculations) has changed a lot between pandas-profiling=1.4 and the current (beta-)version pandas-profiling=2.0 to the following:

profile = df.profile_report(correlations={
    "pearson": False,
    "spearman": False,
    "kendall": False,
    "phi_k": False,
    "cramers": False,
    "recoded":False,}
)

Additionally, you can reduce performed calculations by disabling the calculations of bins for the plotting of histograms.

profile = df.profile_report(plot={'histogram': {'bins': None}}

How to profile large datasets with Pandas profiling?

Tags:

python

pandas

profiling

pandas-profiling

Viv

2 Answers

Giorgos Myrianthous

cptnJ

Recent Activity

Donate For Us

How to profile large datasets with Pandas profiling?

Tags:

python

pandas

profiling

pandas-profiling

Viv

2 Answers

Giorgos Myrianthous

cptnJ

Related questions

Recent Activity

Donate For Us