When I run my code I get the error:
NameError: name 'df_test' is not defined
I don't get this error on my other computer, but on my new one I do. I think it has to do with global and local variables, but that is strange since variables created in the second cell are actually used in the third, the problem occurs in the fourth cell.
I have tried stating global, and then the variables in the first cell, this does not work. Doing this in the third cell, does actually work. But I don't want to keep doing this, because I know from my other computer that this is not normal.
### cell 1
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split,cross_val_score,ShuffleSplit
import os
import scipy
### cell 2
df=pd.read_csv("pandas2.txt",sep=';').drop('listened',axis=1).drop('Usercount',1)
temp_u=df['User'].unique()
temp_s=df['Song'].unique()
avg=df['rating'].mean()
### cell 3
lamda=0.05
gamma=0.04
m=128
splits=20
df_train,df_test=train_test_split(df,test_size=0.1, random_state=1)
beta_u=pd.DataFrame(temp_u,columns=['User'])
beta_s=pd.DataFrame(temp_s,columns=['Song'])
beta_u['beta_u']=0
beta_s['beta_s']=0
for chunk in np.array_split(df_train, splits):
x=chunk.merge(beta_u, on='User',how='left').merge(beta_s,on='Song',how='left')
x['pred']=avg+x['beta_u']+x['beta_s']+(x[pnames]*x[qnames]).sum(axis=1)
x['gradu']=gamma*(x['rating']-x['pred']-lamda*x['beta_u'])
beta_u=beta_u.merge(x[['User','gradu']].groupby('User').mean(),on='User',how="left").groupby('User').mean().fillna(0)
beta_u['beta_u']+=beta_u['gradu']
beta_u=beta_u.drop(['gradu'],axis=1)
x['grads']=gamma*(x['rating']-x['pred']-lamda*x['beta_s'])
beta_s=beta_s.merge(x[['Song','grads']].groupby('Song').mean(),on='Song',how="left").fillna(0)
beta_s['beta_s']+=beta_s['grads']
beta_s=beta_s.drop(['grads'],axis=1)
x[pgrad]=(x[qnames].multiply(x['rating']-x['pred'], axis="index")+np.array(x[qnames]**2)*np.array(x[pnames]))#.divide((x[qnames]*x[qnames]).sum(axis=1),axis=0)
beta_u=beta_u.merge(x[['User']+pgrad].groupby('User').mean(),on='User',how="left").fillna(0)
beta_u[pnames]=beta_u[pgrad]#np.array(beta_u[pnames])+np.array(beta_u[pgrad])
beta_u[pnames]=np.where(beta_u[pnames]>0,beta_u[pnames],10**(-6))
beta_u=beta_u.drop(pgrad,1)
x[qgrad]=(x[pnames].multiply(x['rating']-x['pred'], axis="index")+np.array(x[pnames]**2)*np.array(x[qnames]))#.divide((x[pnames]*x[pnames]).sum(axis=1),axis=0)
beta_s=beta_s.merge(x[['Song']+qgrad].groupby('Song').mean(),on='Song',how="left").fillna(0)
beta_s[qnames]=beta_s[qgrad]#np.array(beta_s[qnames])+np.array(beta_s[qgrad])
beta_s[qnames]=np.where(beta_s[qnames]>0,beta_s[qnames],10**(-6))
beta_s=beta_s.drop(qgrad,1)
x=df_test.merge(beta_u, on='User',how='left').merge(beta_s,on='Song',how='left').fillna(0)
x['pred']=x['beta_u']+x['beta_s']+avg+(np.array(x[pnames])*np.array(x[qnames])).sum(axis=1)
x['pred2']=np.where(x['pred']>0.5,1,0)
RMSE=np.mean((x['rating']-x['pred'])**2)
RMSE2=np.mean((x['rating']-x['pred2'])**2)
print(RMSE)
print(RMSE2)
### cell 4
t=len(df_test)
sim_Song=pd.DataFrame(scipy.sparse.load_npz('simUser.npz').todense())
sim_Song.index=pd.read_csv('Itemnames.csv',sep=';')['Song']
sim_Song.columns=pd.read_csv('Itemnames.csv',sep=';')['Song']
beta_s=beta_s.set_index('Song')
NameError: name 'df_test' is not defined
And when global df_train, df_test, df, x, beta_s, beta_u
is put on the top of cell 3, it works fine
The problem somehow is %%time
. If I delete this suddenly everything works perfectly fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With