Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to compare two columns in pandas to make a third column ?

Tags:

python

pandas

i have two columns age and sex in a pandas dataframe

sex = ['m', 'f' , 'm', 'f', 'f', 'f', 'f']
age = [16 ,  15 , 14 , 9  , 8   , 2   , 56 ]

now i want to extract a third column : like this if age <=9 then output ' child' and if age >9 then output the respective gender

sex = ['m', 'f'  , 'm','f'    ,'f'    ,'f'    , 'f']
age = [16 ,  15  , 14 , 9     , 8     , 2     , 56 ]
yes = ['m', 'f'  ,'m' ,'child','child','child','f' ]

please help ps . i am still working on it if i get anything i will immediately update

like image 850
Anurag Pandey Avatar asked Aug 12 '16 19:08

Anurag Pandey


People also ask

How compare two columns and create column in pandas?

By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.

How do I compare two DataFrame columns?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

How do I compare 3 columns in pandas?

The new column called all_matching shows whether or not the values in all three columns match in a given row. For example: All three values match in the first row, so True is returned. Not every value matches in the second row, so False is returned.

How do I compare two columns and return values in third column in Python?

In the Formula Type drop down list, please select Lookup option; Then, select Look for a value in list option in the Choose a formula list box; And then, in the Arguments input text boxes, select the data range, criteria cell and column you want to return matched value from separately.


2 Answers

Use numpy.where:

df['col3'] = np.where(df['age'] <= 9, 'child', df['sex'])

The resulting output:

   age sex   col3
0   16   m      m
1   15   f      f
2   14   m      m
3    9   f  child
4    8   f  child
5    2   f  child
6   56   f      f

Timings

Using the following setup to get a larger sample DataFrame:

np.random.seed([3,1415])
n = 10**5
df = pd.DataFrame({'sex': np.random.choice(['m', 'f'], size=n), 'age': np.random.randint(0, 100, size=n)})

I get the following timings:

%timeit np.where(df['age'] <= 9, 'child', df['sex'])
1000 loops, best of 3: 1.26 ms per loop

%timeit df['sex'].where(df['age'] > 9, 'child')
100 loops, best of 3: 3.25 ms per loop

%timeit df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
100 loops, best of 3: 3.92 ms per loop
like image 176
root Avatar answered Oct 03 '22 02:10

root


You could use pandas.DataFrame.where. For example

child.where(age<=9, sex)
like image 31
Tim Fuchs Avatar answered Oct 03 '22 04:10

Tim Fuchs