Can we generate contingency table for chisquare test using python?

Tags:

I am using scipy.stats.chi2_contingency method to get chi square statistics. We need to pass frequency table i.e. contingency table as parameter. But I have a feature vector and want to automatically generate the frequency table. Do we have any such function available? I am doing it like this currently:

Click to copy

def contigency_matrix_categorical(data_series,target_series,target_val,indicator_val):
  observed_freq={}
  for targets in target_val:
      observed_freq[targets]={}
      for indicators in indicator_val:
          observed_freq[targets][indicators['val']]=data_series[((target_series==targets)&(data_series==indicators['val']))].count()
  f_obs=[]
  var1=0
  var2=0
  for i in observed_freq:
      var1=var1+1
      var2=0
      for j in observed_freq[i]:
          f_obs.append(observed_freq[i][j]+5)
          var2=var2+1
  arr=np.array(f_obs).reshape(var1,var2)
  c,p,dof,expected=chi2_contingency(arr)
  return {'score':c,'pval':p,'dof':dof}

Where data series and target series are the columns values and the other two are the name of the indicator. Can anyone help? thanks

255

asked Jul 15 '14 20:07

icm

1 Answers

You can use pandas.crosstab to generate a contingency table from a DataFrame. From the documentation:

Compute a simple cross-tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Below is an usage example:

Click to copy

import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# Some fake data.
n = 5  # Number of samples.
d = 3  # Dimensionality.
c = 2  # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])

# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])

# Chi-square test of independence.
c, p, dof, expected = chi2_contingency(contingency)

The following data table

generates the following contingency table

Then, scipy.stats.chi2_contingency(contingency) returns (0.052, 0.819, 1, array([[1.6, 0.4],[2.4, 0.6]])).

193

answered Sep 20 '22 22:09

mdeff

Related questions
                            
                                Serialize multiple models in a single view
                            
                                How can I use a HiddenField to coerce integer data in WTForms?
                            
                                ValueError using recursive feature elimination for SVM with rbf kernel in scikit-learn
                            
                                Python - colormap in matplotlib for 3D line plot
                            
                                Interpolating a peak for two values of x - Python
                            
                                Error while executing os.getcwd()?
                            
                                scraping multiple pages with scrapy
                            
                                Is regular expression search guaranteed to return first match?
                            
                                matplotlib figures are not displayed when one types imshow(img) in the command prompt in pdb mode
                            
                                Fabric/Python: AttributeError: 'NoneType' object has no attribute 'partition'
                            
                                Pandas replace non-zero values
                            
                                NAO robot remote audio problems
                            
                                Error when "import matplotlib.pyplot as plt"
                            
                                Programmatically generate requirements.txt file
                            
                                Quick NLTK parse into syntax tree
                            
                                Import python script with arguments
                            
                                How to get rid of ascii encoding error in python
                            
                                Get all combinations of neighbour elements in list
                            
                                Python3 Pillow Get all pixels on a line
                            
                                writing only dictionary values into a text file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can we generate contingency table for chisquare test using python?

Tags:

python

statistics

scipy

statsmodels

chi-squared

icm

People also ask

1 Answers

mdeff

Recent Activity

Donate For Us