Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Highlight a column value based off another column value in pandas

Tags:

python

pandas

I have a function like this:

def highlight_otls(df):
    return ['background-color: yellow']

And a DataFrame like this:

price   outlier 
1.99       F,C
1.49       L,C
1.99         F
1.39         N

What I want to do is highlight a certain column in my df based off of this condition of another column:

data['outlier'].str.split(',').str.len() >= 2

So if the column values df['outlier'] >= 2, I want to highlight the corresponding column df['price']. (So the first 2 prices should be highlighted in my dataframe above).

I attempted to do this by doing the following which gives me an error:

data['price'].apply(lambda x: highlight_otls(x) if (x['outlier'].str.split(',').str.len()) >= 2, axis=1)

Any idea on how to do this the proper way?

like image 952
Hana Avatar asked Jun 06 '18 15:06

Hana


People also ask

How do you see if values in one column are in another pandas?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.

How do I compare two column values in pandas?

To find the positions of two matching columns, we first initialize a pandas dataframe with two columns of city names. Then we use where() of numpy to compare the values of two columns. This returns an array that represents the indices where the two columns have the same value.

What is ILOC () in Python?

The iloc() function in python is defined in the Pandas module, which helps us select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.

How do you get a specific value of a column from a DataFrame in Python?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.


1 Answers

Use Styler.apply. (To output to xlsx format, use to_excel function.)

Suppose one's dataset is

other   price   outlier
0   X   1.99    F,C
1   X   1.49    L,C
2   X   1.99    F
3   X   1.39    N

def hightlight_price(row):
    ret = ["" for _ in row.index]
    if len(row.outlier.split(",")) >= 2:
        ret[row.index.get_loc("price")] = "background-color: yellow"
    return ret
       
df.style.\
    apply(hightlight_price, axis=1).\
    to_excel('styled.xlsx', engine='openpyxl')

From the documentation, "DataFrame.style attribute is a property that returns a Styler object."

We pass our styling function, hightlight_price, into Styler.apply and demand a row-wise nature of the function with axis=1. (Recall that we want to color the price cell in each row based on the outlier information in the same row.)

Our function hightlight_price will generate the visual styling for each row. For each row row, we first generate styling for other, price, and outlier column to be ["", "", ""]. We can obtain the right index to modify only the price part in the list with row.index.get_loc("price") as in

ret[row.index.get_loc("price")] = "background-color: yellow"
# ret becomes ["", "background-color: yellow", ""]

Results

enter image description here

like image 108
Tai Avatar answered Sep 18 '22 16:09

Tai