i have two columns age and sex in a pandas dataframe
sex = ['m', 'f' , 'm', 'f', 'f', 'f', 'f']
age = [16 , 15 , 14 , 9 , 8 , 2 , 56 ]
now i want to extract a third column : like this if age <=9 then output ' child' and if age >9 then output the respective gender
sex = ['m', 'f' , 'm','f' ,'f' ,'f' , 'f']
age = [16 , 15 , 14 , 9 , 8 , 2 , 56 ]
yes = ['m', 'f' ,'m' ,'child','child','child','f' ]
please help ps . i am still working on it if i get anything i will immediately update
By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
The new column called all_matching shows whether or not the values in all three columns match in a given row. For example: All three values match in the first row, so True is returned. Not every value matches in the second row, so False is returned.
In the Formula Type drop down list, please select Lookup option; Then, select Look for a value in list option in the Choose a formula list box; And then, in the Arguments input text boxes, select the data range, criteria cell and column you want to return matched value from separately.
Use numpy.where
:
df['col3'] = np.where(df['age'] <= 9, 'child', df['sex'])
The resulting output:
age sex col3
0 16 m m
1 15 f f
2 14 m m
3 9 f child
4 8 f child
5 2 f child
6 56 f f
Timings
Using the following setup to get a larger sample DataFrame:
np.random.seed([3,1415])
n = 10**5
df = pd.DataFrame({'sex': np.random.choice(['m', 'f'], size=n), 'age': np.random.randint(0, 100, size=n)})
I get the following timings:
%timeit np.where(df['age'] <= 9, 'child', df['sex'])
1000 loops, best of 3: 1.26 ms per loop
%timeit df['sex'].where(df['age'] > 9, 'child')
100 loops, best of 3: 3.25 ms per loop
%timeit df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
100 loops, best of 3: 3.92 ms per loop
You could use pandas.DataFrame.where. For example
child.where(age<=9, sex)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With