Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: assign category based on where value falls in range

I have the following ranges and a pandas DataFrame:

x >= 0        # success
-10 <= x < 0  # warning
X < -10       # danger

df = pd.DataFrame({'x': [2, 1], 'y': [-7, -5], 'z': [-30, -20]})

I'd like to categorize the values in the DataFrame based on where they fall within the defined ranges. So I'd like the final DF to look something like this:

    x    y    z    x_cat    y_cat    z_cat
0   2   -7  -30  success  warning   danger
1   1   -5  -20  success  warning   danger

I've tried using the category datatype but it doesn't appear I can define a range anywhere.

for category_column, value_column in zip(['x_cat', 'y_cat', 'z_cat'], ['x', 'y', 'z']):
    df[category_column] = df[value_column].astype('category')

Can I use the category datatype? If not, what can I do here?

like image 429
Johnny Metz Avatar asked Jun 20 '17 16:06

Johnny Metz


4 Answers

pandas.cut

c = pd.cut(
    df.stack(),
    [-np.inf, -10, 0, np.inf],
    labels=['danger', 'warning', 'success']
)
df.join(c.unstack().add_suffix('_cat'))

   x  y   z    x_cat    y_cat   z_cat
0  2 -7 -30  success  warning  danger
1  1 -5 -20  success  warning  danger

numpy

v = df.values
cats = np.array(['danger', 'warning', 'success'])
code = np.searchsorted([-10, 0], v.ravel()).reshape(v.shape)
cdf = pd.DataFrame(cats[code], df.index, df.columns)
df.join(cdf.add_suffix('_cat'))

   x  y   z    x_cat    y_cat   z_cat
0  2 -7 -30  success  warning  danger
1  1 -5 -20  success  warning  danger
like image 142
piRSquared Avatar answered Oct 10 '22 06:10

piRSquared


You can use pandas cut, but you need to apply it column by column (just because the function operates on 1-d input):

labels = df.apply(lambda x: pd.cut(x, [-np.inf, -10, 0, np.inf], labels = ['danger', 'warning', 'success']))

          x        y       z
0  success  warning  danger
1  success  warning  danger

So you can do:

pd.concat([df, labels.add_prefix('_cat')], axis = 1)

   x  y   z     cat_x     cat_y    cat_z
0  2 -7 -30  success  warning  danger
1  1 -5 -20  success  warning  danger
like image 43
FLab Avatar answered Oct 10 '22 04:10

FLab


you could use assign to make new columns. for each new column use apply to filter the series.

df.assign(x_cat = lambda v: v.x.apply(lambda x: 'Sucess' if x>=0 else None),
         y_cat = lambda v: v.y.apply(lambda x: 'warning' if -10<=x<0 else None),
         z_cat = lambda v: v.z.apply(lambda x: 'danger' if x<=-10 else None),)

will result in

    x   y   z   x_cat   y_cat   z_cat
0   2   -7  -30 Sucess  warning danger
1   1   -5  -20 Sucess  warning danger
like image 3
plasmon360 Avatar answered Oct 10 '22 04:10

plasmon360


You could write a little function and then pass each series to the function using apply:

df = pd.DataFrame({'x': [2, 1], 'y': [-7, -5], 'z': [-30, -20]})

def cat(x):
    if x <-10:
        return "Danger"
    if x < 0:
        return "Warning"
    return "Success"

for col in df.columns:
    df[col] = df[col].apply(lambda x: cat(x))
like image 3
Woody Pride Avatar answered Oct 10 '22 06:10

Woody Pride