Using mca package in Python

Tags:

I am trying to use the mca package to do multiple correspondence analysis in Python.

I am a bit confused as to how to use it. With PCA I would expect to fit some data (i.e. find principal components for those data) and then later I would be able to use the principal components that I found to transform unseen data.

Based on the MCA documentation, I cannot work out how to do this last step. I also don't understand what any of the weirdly cryptically named properties and methods do (i.e. .E, .L, .K, .k etc).

So far if I have a DataFrame with a column containing strings (assume this is the only column in the DF) I would do something like

import mca
ca = mca.MCA(pd.get_dummies(df, drop_first=True))

from what I can gather

ca.fs_r(1)

is the transformation of the data in df and

ca.L

is supposed to be the eigenvalues (although I get a vector of 1s that is one element fewer that my number of features?).

now if I had some more data with the same features, let's say df_new and assuming I've already converted this correctly to dummy variables, how do I find the equivalent of ca.fs_r(1) for the new data

706

asked Jan 30 '18 12:01

Dan

2 Answers

One other method is to use the library prince which enables easy usage of tools such as:

Multiple correspondence analysis (MCA)
Principal component analysis (PCA)
Multiple factor analysis (MFA)

You can begin first by installing with:

pip install --user prince

To use MCA, it is fairly simple and can be done in a couple of steps (just like sklearn PCA method.) We first build our dataframe.

import pandas as pd 
import prince

X = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/balloons/adult+stretch.data')
X.columns = ['Color', 'Size', 'Action', 'Age', 'Inflated']

print(X.head())

mca = prince.MCA()

# outputs
>>     Color   Size   Action    Age Inflated
   0  YELLOW  SMALL  STRETCH  ADULT        T
   1  YELLOW  SMALL  STRETCH  CHILD        F
   2  YELLOW  SMALL      DIP  ADULT        F
   3  YELLOW  SMALL      DIP  CHILD        F
   4  YELLOW  LARGE  STRETCH  ADULT        T

Followed by calling the fit and transform method.

mca = mca.fit(X) # same as calling ca.fs_r(1)
mca = mca.transform(X) # same as calling ca.fs_r_sup(df_new) for *another* test set.
print(mca)

# outputs
>>         0             1
0   0.705387  8.373126e-15
1  -0.386586  8.336230e-15
2  -0.386586  6.335675e-15
3  -0.852014  6.726393e-15
4   0.783539 -6.333333e-01
5   0.783539 -6.333333e-01
6  -0.308434 -6.333333e-01
7  -0.308434 -6.333333e-01
8  -0.773862 -6.333333e-01
9   0.783539  6.333333e-01
10  0.783539  6.333333e-01
11 -0.308434  6.333333e-01
12 -0.308434  6.333333e-01
13 -0.773862  6.333333e-01
14  0.861691 -5.893240e-15
15  0.861691 -5.893240e-15
16 -0.230282 -5.930136e-15
17 -0.230282 -7.930691e-15
18 -0.695710 -7.539973e-15

You can even print out the picture diagram of it, since it incorporates matplotlib library.

ax = mca.plot_coordinates(
     X=X,
     ax=None,
     figsize=(6, 6),
     show_row_points=True,
     row_points_size=10,
     show_row_labels=False,
     show_column_points=True,
     column_points_size=30,
     show_column_labels=False,
     legend_n_cols=1
     )

ax.get_figure().savefig('images/mca_coordinates.svg')

mca

159

answered Sep 21 '22 20:09

Axois

The documentation of the mca package is not very clear with that regard. However, there are a few cues which suggest that ca.fs_r_sup(df_new) should be used to project new (unseen) data onto the factors obtained in the analysis.

The package author refers to new data as supplementary data which is the terminology used in following paper: Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. Encyclopedia of measurement and statistics, 651-657.
The package has only two functions which accept new data as parameter DF: fs_r_sup(self, DF, N=None) and fs_c_sup(self, DF, N=None). The latter is to find the column factor scores.
The usage guide demonstrates this based on a new data frame which has not been used throughout the component analysis.

answered Sep 20 '22 20:09

Jan Trienes

Related questions
                            
                                Pip Install hangs
                            
                                PyCrypto for Python3 in Alpine?
                            
                                class diagram viewer application for python3 source
                            
                                "Handling signal: ttou" message while running DAG in airflow
                            
                                Adding list with different length as a new column to a dataframe
                            
                                Sort dictionary of lists by key value pairs
                            
                                What does `super()` mean in `__new__`
                            
                                Token based authentication with flask-security extension
                            
                                Receiving "NO CARRIER" error while tring to make a call using GSM modem in Python
                            
                                Using Sympy Equations for Plotting
                            
                                How to extract feature importances from an Sklearn pipeline
                            
                                Checking whether two rectangles overlap in python using two bottom left corners and top right corners
                            
                                What is difference between Discard() and Remove() function in python 3 sets [duplicate]
                            
                                how to retry async aiohttp requests depending on the status code
                            
                                How to graph tf.keras model in Tensorflow-2.0?
                            
                                Difference between encoding utf-8 and utf8 in Python 3.5
                            
                                type hint returns NameError: name 'datetime' not defined
                            
                                Using HTMLParser in Python 3.2
                            
                                Ignore KeyError and continue program
                            
                                object to string in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using mca package in Python

Tags:

python-3.x

pandas

scikit-learn

pca

Dan

People also ask

2 Answers

Axois

Jan Trienes

Recent Activity

Donate For Us