Compare a matrix against a column vector

Q: Is a column vector a matrix?

A column vector is a matrix with 1 column. Let's take a look at a formal definition of a column vector. A column vector is a m × 1 matrix consisting of a single column with m elements.

Q: Is a 1x1 matrix a column vector?

If a matrix has only one column, then it is a column vector. If a matrix has only one row, then it is a row vector. A 1x1 matrix is a scalar.

Q: How do you express a vector as a column vector?

2.1. One way to create a column vector is to explicitly put the values in square brackets, separated by semicolons (rather than commas or spaces): >> c = [1; 2; 3; 4] c = 1.

Tags:

numpy-ndarray

Arrays 'A' and vector 'B' below are part of pandas dataframe.

I have a large array A of form:

I have a vector B of form:

How do I compare pythonically each column of A against B. I am trying to get True/False values for A < B comparison to get the following result:

TRUE    FALSE   FALSE
FALSE   FALSE   FALSE
TRUE    TRUE    TRUE
TRUE    FALSE   FALSE

I can do list comprehension syntax but is there a better way to pull this off. My array A and B are very large.

265

asked Mar 14 '17 16:03

1 Answers

Consider the pd.DataFrame and pd.Series, A and B

A = pd.DataFrame([
        [28, 39, 52],
        [77, 80, 66],
        [7, 18, 24],
        [9, 97, 68]
    ])

B = pd.Series([32, 5, 42, 17])

`pandas`

By default, when you compare a pd.DataFrame with a pd.Series, pandas aligns each index value from the series with the column names of the dataframe. This is what happens when you use A < B. In this case, you have 4 rows in your dataframe and 4 elements in your series, so I'm going to assume you want to align the index values of the series with the index values of the dataframe. In order to specify the axis you want to align with, you need to use the comparison method rather than the operator. That's because when you use the method, you can use the axis parameter and specify that you want axis=0 rather than the default axis=1.

A.lt(B, axis=0)

       0      1      2
0   True  False  False
1  False  False  False
2   True   True   True
3   True  False  False

I often just write this as A.lt(B, 0)

`numpy`

In numpy, you also have to pay attention to the dimensionality of the arrays and you are assuming that the positions are already lined up. The positions will be taken care of if they come from the same dataframe.

print(A.values)

[[28 39 52]
 [77 80 66]
 [ 7 18 24]
 [ 9 97 68]]

print(B.values)

[32  5 42 17]

Notice that B is a 1 dimensional array while A is a 2 dimensional array. In order to compare B along the rows of A we need to reshape B into a 2 dimensional array. The most obvious way to do this is with reshape

print(A.values < B.values.reshape(4, 1))

[[ True False False]
 [False False False]
 [ True  True  True]
 [ True False False]]

However, these are ways in which you will commonly see others do the same reshaping

A.values < B.values.reshape(-1, 1)

A.values < B.values[:, None]

timed back test

To get a handle of how fast these comparisons are, I've constructed the following back test.

def pd_cmp(df, s):
    return df.lt(s, 0)

def np_cmp_a2a(df, s):
    """To get an apples to apples comparison
    I return the same thing in both functions"""
    return pd.DataFrame(
        df.values < s.values[:, None],
        df.index, df.columns
    )

def np_cmp_a2o(df, s):
    """To get an apples to oranges comparison
    I return a numpy array"""
    return df.values < s.values[:, None]


results = pd.DataFrame(
    index=pd.Index([10, 1000, 100000], name='group size'),
    columns=pd.Index(['pd_cmp', 'np_cmp_a2a', 'np_cmp_a2o'], name='method'),
)

from timeit import timeit

for i in results.index:
    df = pd.concat([A] * i, ignore_index=True)
    s = pd.concat([B] * i, ignore_index=True)
    for j in results.columns:
        results.set_value(
            i, j,
            timeit(
                '{}(df, s)'.format(j),
                'from __main__ import {}, df, s'.format(j),
                number=100
            )
        )

results.plot()

enter image description here

I can conclude that the numpy based solutions are faster but not all that much. They all scale the same.

answered Nov 15 '22 19:11

piRSquared

Related questions
                            
                                Convert Numpy array of ASCII codes to string
                            
                                SyntaxError "no viable alternative at input 'self'"
                            
                                Vectorised Haversine formula with a pandas dataframe
                            
                                django queryset return single value
                            
                                How do I use reportlab's drawImage with an image url?
                            
                                dynamically set database based on request in django
                            
                                Django return count when date stored <= last 24 hours
                            
                                Using program variables in python logging configuration file
                            
                                Understanding parent and controller in Tkinter __init__
                            
                                Unable to stop Streaming in tweepy after one minute
                            
                                How to read binary files as hex in Python?
                            
                                How to use virtualenv with Python? [closed]
                            
                                AppRegistryNotReady: Apps aren't loaded yet. Django
                            
                                does python have a list constructor?
                            
                                Travel directory tree with limited recursion depth
                            
                                Find first item with alphabetical precedence in list with numbers
                            
                                Pytest cov does not generate any report
                            
                                how to get the first day and last day of current month in python [duplicate]
                            
                                How to delete row/rows from a qtableview in pyqt?
                            
                                Seaborn, violin plot with one data per column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compare a matrix against a column vector

Tags:

python

multidimensional-array

pandas

numpy