I have a large dataframe which looks as:
df1['A'].ix[1:3] 2017-01-01 02:00:00 [33, 34, 39] 2017-01-01 03:00:00 [3, 43, 9]
I want to replace each element greater than 9 with 11.
So, the desired output for above example is:
df1['A'].ix[1:3] 2017-01-01 02:00:00 [11, 11, 11] 2017-01-01 03:00:00 [3, 11, 9]
Edit:
My actual dataframe has about 20,000 rows and each row has list of size 2000.
Is there a way to use numpy.minimum
function for each row? I assume that it will be faster than list comprehension
method?
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Select Pandas Rows With Column Values Greater Than or Smaller Than Specific Value. To select Pandas rows with column values greater than or smaller than specific value, we use operators like > , <= , >= while creating masks or queries.
Pandas DataFrame: ge() functionThe ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison. Any single or multiple element data structure, or list-like object.
Very simply : df[df > 9] = 11
You can use apply
with list comprehension
:
df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x]) print (df1) A 2017-01-01 02:00:00 [11, 11, 11] 2017-01-01 03:00:00 [3, 11, 9]
Faster solution is first convert to numpy array
and then use numpy.where
:
a = np.array(df1['A'].values.tolist()) print (a) [[33 34 39] [ 3 43 9]] df1['A'] = np.where(a > 9, 11, a).tolist() print (df1) A 2017-01-01 02:00:00 [11, 11, 11] 2017-01-01 03:00:00 [3, 11, 9]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With