I've been trying to print out a Pandas dataframe to html and have specific entire rows highlighted if the value of one specific column's value for that row is over a threshold. I've looked through the Pandas Styler Slicing and tried to vary the highlight_max function for such a use, but seem to be failing miserably; if I try, say, to replace the is_max with a check for whether a given row's value is above said threshold (e.g., something like
is_x = df['column_name'] >= threshold
), it isn't apparent how to properly pass such a thing or what to return.
I've also tried to simply define it elsewhere using df.loc, but that hasn't worked too well either.
Another concern also came up: If I drop that column (currently the criterion) afterwards, will the styling still hold? I am wondering if a df.loc would prevent such a thing from being a problem.
To highlight a particular cell of a DataFrame, use the DataFrame's style. apply(~) method.
Fill Data in an Empty Pandas DataFrame by Appending Rows First, create an empty DataFrame with column names and then append rows one by one. The append() method can also append rows. When creating an empty DataFrame with column names and row indices, we can fill data in rows using the loc() method.
This solution allows for you to pass a column label or a list of column labels to highlight the entire row if that value in the column(s) exceeds the threshold.
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
axis=1)
df.iloc[0, 2] = np.nan
def highlight_greaterthan(s, threshold, column):
is_max = pd.Series(data=False, index=s.index)
is_max[column] = s.loc[column] >= threshold
return ['background-color: yellow' if is_max.any() else '' for v in is_max]
df.style.apply(highlight_greaterthan, threshold=1.0, column=['C', 'B'], axis=1)
Output:
Or for one column
df.style.apply(highlight_greaterthan, threshold=1.0, column='E', axis=1)
Here is a simpler approach:
Assume you have a 100 x 10 dataframe, df. Also assume you want to highlight all the rows corresponding to a column, say "duration", greater than 5.
You first need to define a function that highlights the cells. The real trick is that you need to return a row, not a single cell. For example:
def highlight(s):
if s.duration > 5:
return ['background-color: yellow'] * len(s)
else:
return ['background-color: white'] * len(s)
**Note that the return part should be a list of 10 (corresponding to the number of columns). This is the key part.
Now you can apply this to the dataframe style as:
df.style.apply(highlight, axis=1)
Assume you have the following dataframe and you want to highlight the rows where id
is greater than 3
to red
id char date
0 0 s 2022-01-01
1 1 t 2022-02-01
2 2 y 2022-03-01
3 3 l 2022-04-01
4 4 e 2022-05-01
5 5 r 2022-06-01
You can try Styler.set_properties
with pandas.IndexSlice
# Subset your original dataframe with condition
df_ = df[df['id'].gt(3)]
# Pass the subset dataframe index and column to pd.IndexSlice
slice_ = pd.IndexSlice[df_.index, df_.columns]
s = df.style.set_properties(**{'background-color': 'red'}, subset=slice_)
s.to_html('test.html')
You can also try Styler.apply
with axis=None
which passes the whole dataframe.
def styler(df):
color = 'background-color: {}'.format
mask = pd.concat([df['id'].gt(3)] * df.shape[1], axis=1)
style = np.where(mask, color('red'), color('green'))
return style
s = df.style.apply(styler, axis=None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With