Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up pandas profiling analysis using check_correlation?

Using pandas profiling to generate a report. the size of the dataset is very large to speed up the processing im trying to turn off correlations so i used check_correlations from another post I saw, ValueError: Config parameter "check_correlation" does not exist. is then the issue I get from using this line

a = prof.ProfileReport(df, title='Downloads', check_correlation=False)

which generates this issue of

ValueError: Config parameter "check_correlation" does not exist.

like image 453
OCTAVIAN Avatar asked Oct 09 '19 07:10

OCTAVIAN


People also ask

What are interactions in pandas profiling?

Interactions. The interactions section of the report allows you to plot one variable against another in order to understand how they relate to each other. Interactions section of the pandas-profiling report.

Why do we use pandas profiling?

Pandas profiling is a Python library that performs an automated Exploratory Data Analysis. It automatically generates a dataset profile report that gives valuable insights. For example, we can know which variables to use and which ones we can drop using the profile report.


1 Answers

Since they have changed the configurations on version 2, you could use it as:

import pandas_profiling

profile = df.profile_report(check_correlation_pearson=False,
correlations={'pearson': False,
'spearman': False,
'kendall': False,
'phi_k': False,
'cramers': False,
'recoded': False})

to turn off correlations. However, it is still not as fast as version 1.4. You could also investigate other configurations here.

like image 117
Levent Avatar answered Nov 12 '22 08:11

Levent