Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add multiple values in a row of a dataframe while iterate trough a list

I am building a recommender system of foods and I have a dataframe:

df:
            meat vegetables cheese ketchup egg...
hamburger     3      5        2       2     1   
    pasta     0      0        4       0     1    
     soup     0      2        0       0     0     
      ...

I also have a list which contains ingredients that an user does not like:

dislike:["cheese", "egg"]  

So what I am trying to do is to create a function which adds a new row "user_name" with a 10 in those ingredients that he/she does not like and a 0 in all the others columns. Output should be:

            meat vegetables cheese ketchup egg...
hamburger     3      5        2       2     1   
    pasta     0      0        4       0     1    
     soup     0      2        0       0     0     
 new_user     0      0       10       0    10
...

I have simplify the dataframe and the list in order to make it more comprehensive, but they are actually way more longer.

This is what I have write until now:

def user_pre(df):
    dislike=["cheese","egg"]
    for ing in dislike:
            df.loc["new_user"]= pd.Series({ing:10})
    return df

I "works" but only for the last element in dislike list. Besides it does not add a 0 in the other cells but a Nan.

Thank you so much in advance!

like image 584
gresell Avatar asked Nov 30 '25 02:11

gresell


2 Answers

I am not sure how "healthy" it is to mix users with dishes in a single pandas DataFrame but a function like this should do the work:

def insert_user_dislikes(user_name='new_user', df=df, ingredients=['meat', 'egg']):
    df.loc[user_name] = [10 if col in ingredients else 0 for col in df.columns]

insert_user_dislikes('new_user', df, ['meat', 'egg'])

Edit 1: I like @Fred's Solution as well:

def insert_user_dislikes2(user_name='new_user', df=df, ingredients=['meat', 'egg']):
    df.loc[user_name] = 0
    df.loc[user_name, ingredients] = 10
insert_user_dislikes('user_name', df, ['meat', 'egg'])

Edit 2: Here is Shubham's solution for performance assessment:

def insert_user_dislikes3(user_name='new_user', df=df, ingredients=['meat', 'egg']):
    s = pd.Series(
        np.where(df.columns.isin(ingredients), 10, 0), 
        name=user_name, index=df.columns, dtype='int')
    return df.append(s)

In term of performance (on a very small dataset), it looks like the list comprehension one is faster though:

df = pd.DataFrame([[3, 5, 2, 2, 1],
   [0, 0, 4, 0, 1]],
   columns=['meat', 'vegetables', 'cheese','ketchup', 'egg'],
   index=['hamburger', 'pasta'])

print(timeit.timeit(insert_user_dislikes, number=1000))
0.125

print(timeit.timeit(insert_user_dislikes2, number=1000))
0.547

print(timeit.timeit(insert_user_dislikes3, number=1000))
2.153
like image 188
Benoit Fgt Avatar answered Dec 02 '25 16:12

Benoit Fgt


I'm not sure about how efficient the approach is, but this should work

dislikes = ["cheese","egg"]
new_user = "Tom"
df.loc[new_user] = 0
for dislike in dislikes:
    if dislike not in df.columns:
        df[dislike] = 0
    df.loc[new_user, dislike] = 10
like image 29
Fred Avatar answered Dec 02 '25 16:12

Fred