Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing values greater than a number in pandas dataframe

I have a large dataframe which looks as:

df1['A'].ix[1:3] 2017-01-01 02:00:00    [33, 34, 39] 2017-01-01 03:00:00    [3, 43, 9] 

I want to replace each element greater than 9 with 11.

So, the desired output for above example is:

df1['A'].ix[1:3] 2017-01-01 02:00:00    [11, 11, 11] 2017-01-01 03:00:00    [3, 11, 9] 

Edit:

My actual dataframe has about 20,000 rows and each row has list of size 2000.

Is there a way to use numpy.minimum function for each row? I assume that it will be faster than list comprehension method?

like image 942
Zanam Avatar asked May 03 '17 10:05

Zanam


People also ask

How do you replace values in a DataFrame based on a condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you select values greater than in pandas?

Select Pandas Rows With Column Values Greater Than or Smaller Than Specific Value. To select Pandas rows with column values greater than or smaller than specific value, we use operators like > , <= , >= while creating masks or queries.

How do you use greater than in pandas?

Pandas DataFrame: ge() functionThe ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison. Any single or multiple element data structure, or list-like object.


2 Answers

Very simply : df[df > 9] = 11

like image 195
Edouard Cuny Avatar answered Sep 21 '22 08:09

Edouard Cuny


You can use apply with list comprehension:

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x]) print (df1)                                 A 2017-01-01 02:00:00  [11, 11, 11] 2017-01-01 03:00:00    [3, 11, 9] 

Faster solution is first convert to numpy array and then use numpy.where:

a = np.array(df1['A'].values.tolist()) print (a) [[33 34 39]  [ 3 43  9]]  df1['A'] = np.where(a > 9, 11, a).tolist() print (df1)                                 A 2017-01-01 02:00:00  [11, 11, 11] 2017-01-01 03:00:00    [3, 11, 9] 
like image 35
jezrael Avatar answered Sep 23 '22 08:09

jezrael