Pandas DataFrame change a value based on column, index values comparison

Tags:

Suppose that you have a pandas DataFrame which has some kind of data in the body and numbers in the column and index names.

>>> data=np.array([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
>>> columns = [2, 4, 8]
>>> index = [10, 4, 2]
>>> df = pd.DataFrame(data, columns=columns, index=index)
>>> df
    2  4  8
10  a  b  c
4   d  e  f
2   g  h  i

Now suppose we want to manipulate are data frame in some kind of way based on comparing the index and columns. Consider the following.

Where index is greater than column replace letter with 'k':

    2  4  8
10  k  k  k
4   k  e  f
2   g  h  i

Where index is equal to column replace letter with 'U':

    2  4  8
10  k  k  k
4   k  U  f
2   U  h  i

Where column is greater than index replace letter with 'Y':

    2  4  8
10  k  k  k
4   k  U  Y
2   U  Y  Y

To keep the question useful to all:

What is a fast way to do this replacement?
What is the simplest way to do this replacement?

Speed Results from minimal example

jezrael: 556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
user3471881: 329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
thunderwood: 4.65 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Is this a duplicate? I searched google for pandas replace compare index column and the top results are:

Pandas - Compare two dataframes and replace values matching condition

Python pandas: replace values based on location not index value

Pandas DataFrame: replace all values in a column, based on condition

However, I don't feel any of these touch on whether this a) possible or b) how to compare in such a way

528

asked Nov 02 '18 12:11

akozi

1 Answers

I think you need numpy.select with broadcasting:

m1 = df.index.values[:, None] > df.columns.values
m2 = df.index.values[:, None] == df.columns.values


df = pd.DataFrame(np.select([m1, m2], ['k','U'], 'Y'), columns=df.columns, index=df.index)
print (df)
    2  4  8
10  k  k  k
4   k  U  Y
2   U  Y  Y

Performance:

np.random.seed(1000)

N = 1000
a = np.random.randint(100, size=N)
b = np.random.randint(100, size=N)

df = pd.DataFrame(np.random.choice(list('abcdefgh'), size=(N, N)), columns=a, index=b)
#print (df)

def us(df):
    values = np.array(np.array([df.index]).transpose() - np.array([df.columns]), dtype='object')
    greater = values > 0
    less = values < 0
    same = values == 0

    values[greater] = 'k'
    values[less] = 'Y'
    values[same] = 'U'


    return pd.DataFrame(values, columns=df.columns, index=df.index)

def jez(df):

    m1 = df.index.values[:, None] > df.columns.values
    m2 = df.index.values[:, None] == df.columns.values
    return pd.DataFrame(np.select([m1, m2], ['k','U'], 'Y'), columns=df.columns, index=df.index)

In [236]: %timeit us(df)
107 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [237]: %timeit jez(df)
64 ms ± 299 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

111

answered Oct 06 '22 18:10

jezrael

Related questions
                            
                                Seaborn plot two data sets on the same scatter plot
                            
                                How to test command line applications in Python?
                            
                                Pandas compare 1 columns values to another dataframe column, find matching rows
                            
                                How to change the order of keys in a Python 3.5 dictionary, using another list as a reference for keys?
                            
                                Dependencies missing in current linux-64 channels when trying to install tensorflow-gpu with conda command
                            
                                connection pool exhausted psycopg2
                            
                                How can I install a python package onto Google Dataflow and import it into my pipeline?
                            
                                What is arguments[0] while invoking execute_script() method through WebDriver instance through Selenium and Python?
                            
                                Covert a Pandas Dataframe to Dictionary
                            
                                '_sre.SRE_Match' object is not subscriptable
                            
                                ttk.Spinbox missing in tkinter.ttk?
                            
                                Pandas: assign value depending on another dataframe
                            
                                Python-docx: Is it possible to add a new run to paragraph in a specific place (not at the end)
                            
                                How to map key to multiple values to dataframe column?
                            
                                Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0
                            
                                How to use if-else in pandas dataframes
                            
                                logits and labels must be broadcastable: logits_size=[32,1] labels_size=[16,1]
                            
                                Unpacking multiple lists and dictionaries as function arguments in Python 2
                            
                                Set my jupyter notebook to use python version of an enviroment
                            
                                pandas - converting d-mmm-yy to datetime object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas DataFrame change a value based on column, index values comparison

Tags:

python

pandas

dataframe

akozi

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us