I have a pandas data frame as follows:
coname1 coname2
Apple [Microsoft, Apple, Google]
Yahoo [American Express, Jet Blue]
Gap Inc [American Eagle, Walmart, Gap Inc]
I want to create a new column that flags whether the string in coname1 is contained in conames. So, from the above example, the dataframe would now be:
coname1 coname2 isin
Apple [Microsoft, Apple, Google] True
Yahoo [American Express, Jet Blue] False
Gap Inc [American Eagle, Walmart, Gap Inc] True
By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.
To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
set up frame:
df =pd.DataFrame({'coname1':['Apple','Yahoo','Gap Inc'],
'coname2':[['Microsoft', 'Apple', 'Google'],['American Express', 'Jet Blue'],
['American Eagle', 'Walmart', 'Gap Inc']]})
try this:
df['isin'] =df.apply(lambda row: row['coname1'] in row['coname2'],axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With