I recently start working with <code>pandas</code>. Can anyone explain me difference in behaviour of function <code>.corrwith()</code> with <code>Series</code> and <code>DataFrame</code>? Suppose i have one <code>DataFrame</code>: <pre class="prettyprint"><code>frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]}) </code></pre> And i want calculate correlation between features 'a' and all other features. I can do it in the following way: <pre class="prettyprint"><code>frame.drop(labels='a', axis=1).corrwith(frame['a']) </code></pre> And result will be: <pre class="prettyprint"><code>b -1.0 c 0.0 </code></pre> But very similar code: <pre class="prettyprint"><code>frame.drop(labels='a', axis=1).corrwith(frame[['a']]) </code></pre> Generate absolutely different and unacceptable table: <pre class="prettyprint"><code>a NaN b NaN c NaN </code></pre> So, my question is: why in case of <code>DataFrame</code> as second argument we get such strange output?

<h3>What I think you're looking for:</h3> Let's say your frame is: <pre class="prettyprint"><code>frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour']) </code></pre> You want the <code>'cost'</code> and <code>'amount'</code> columns to be correlated with all other columns in every combination. <pre class="prettyprint"><code>focus_cols = ['cost', 'amount'] frame.corr().filter(focus_cols).drop(focus_cols) </code></pre> <img src="https://i.stack.imgur.com/WGNCQ.png" alt="enter image description here"> <h3>Answering what you asked:</h3> <blockquote> Compute pairwise correlation between rows or columns of two DataFrame objects. Parameters: other : DataFrame axis : {0 or ‘index’, 1 or ‘columns’}, default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series </blockquote> <code>corrwith</code> is behaving similarly to <code>add</code>, <code>sub</code>, <code>mul</code>, <code>div</code> in that it expects to find a <code>DataFrame</code> or a <code>Series</code> being passed in <code>other</code> despite the documentation saying just <code>DataFrame</code>. When <code>other</code> is a <code>Series</code> it broadcast that series and matches along the axis specified by <code>axis</code>, default is 0. This is why the following worked: <pre class="prettyprint"><code>frame.drop(labels='a', axis=1).corrwith(frame.a) b -1.0 c 0.0 dtype: float64 </code></pre> When <code>other</code> is a <code>DataFrame</code> it will match the axis specified by <code>axis</code> and correlate each pair identified by the other axis. If we did: <pre class="prettyprint"><code>frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1)) a NaN b NaN c 1.0 dtype: float64 </code></pre> Only <code>c</code> was in common and only <code>c</code> had its correlation calculated. In the case you specified: <pre class="prettyprint"><code>frame.drop(labels='a', axis=1).corrwith(frame[['a']]) </code></pre> <code>frame[['a']]</code> is a <code>DataFrame</code> because of the <code>[['a']]</code> and now plays by the <code>DataFrame</code> rules in which its columns must match up with what its being correlated with. But you explicitly drop <code>a</code> from the first frame then correlate with a <code>DataFrame</code> with nothing but <code>a</code>. The result is <code>NaN</code> for every column.

pandas.DataFrame corrwith() method

Tags:

python

pandas

dataframe

I recently start working with pandas. Can anyone explain me difference in behaviour of function .corrwith() with Series and DataFrame?

Suppose i have one DataFrame:

Click to copy

frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})

And i want calculate correlation between features 'a' and all other features. I can do it in the following way:

Click to copy

frame.drop(labels='a', axis=1).corrwith(frame['a'])

And result will be:

Click to copy

b   -1.0
c    0.0

But very similar code:

Click to copy

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

Generate absolutely different and unacceptable table:

Click to copy

a   NaN
b   NaN
c   NaN

So, my question is: why in case of DataFrame as second argument we get such strange output?

992

asked Jul 17 '16 13:07

Nikita Sivukhin

1 Answers

What I think you're looking for:

Let's say your frame is:

Click to copy

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

You want the 'cost' and 'amount' columns to be correlated with all other columns in every combination.

Click to copy

focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)

enter image description here

Answering what you asked:

Compute pairwise correlation between rows or columns of two DataFrame objects.

Parameters:

other : DataFrame

axis : {0 or ‘index’, 1 or ‘columns’},

default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series

corrwith is behaving similarly to add, sub, mul, div in that it expects to find a DataFrame or a Series being passed in other despite the documentation saying just DataFrame.

When other is a Series it broadcast that series and matches along the axis specified by axis, default is 0. This is why the following worked:

Click to copy

frame.drop(labels='a', axis=1).corrwith(frame.a)

b   -1.0
c    0.0
dtype: float64

When other is a DataFrame it will match the axis specified by axis and correlate each pair identified by the other axis. If we did:

Click to copy

frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))

a    NaN
b    NaN
c    1.0
dtype: float64

Only c was in common and only c had its correlation calculated.

In the case you specified:

Click to copy

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

frame[['a']] is a DataFrame because of the [['a']] and now plays by the DataFrame rules in which its columns must match up with what its being correlated with. But you explicitly drop a from the first frame then correlate with a DataFrame with nothing but a. The result is NaN for every column.

130

answered Sep 22 '22 13:09

piRSquared

Related questions
                            
                                Show only errors with pylint and syntastic in vim
                            
                                BeautifulSoup find only elements where an attribute contains a sub-string? Is this possible?
                            
                                ImportError: No module named 'html.parser'; 'html' is not a package (python3) [duplicate]
                            
                                Creating transactions with with statements in psycopg2
                            
                                Matplotlib into a Django Template
                            
                                Read merged cells in Excel with Python
                            
                                Python: issue when using vars() dictionary
                            
                                Plotting Histogram with given x and y values
                            
                                argsort for a multidimensional ndarray
                            
                                How to detect bullet holes on the target
                            
                                Django model one foreign key to many tables
                            
                                Insert 0s into 2d array
                            
                                What's the difference between apt-get virtualenv and pip virtualenv?
                            
                                Argparse with two values for one argument
                            
                                How to have python code and markdown in one cell
                            
                                FileNotFoundError: [WinError 2] The system cannot find the file specified:
                            
                                Why does python/numpy's += mutate the original array?
                            
                                Set weight and bias tensors of tensorflow conv2d operation
                            
                                How to get the count of an element in a tensor in TensorFlow?
                            
                                Add trend line to pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas.DataFrame corrwith() method

Tags:

python

pandas

dataframe

Nikita Sivukhin

People also ask

1 Answers

What I think you're looking for:

Answering what you asked:

piRSquared

Recent Activity

Donate For Us