Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"ValueError: labels ['timestamp'] not contained in axis" error

I have this code ,i want to remove the column 'timestamp' from the file :u.data but can't.It shows the error
"ValueError: labels ['timestamp'] not contained in axis" How can i correct it

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.cross_validation import KFold
from sklearn.cross_validation import train_test_split



data = pd.read_table('u.data')
data.columns=['userID', 'itemID','rating', 'timestamp']
data.drop('timestamp', axis=1)


N = len(data)
print data.shape
print list(data.columns)
print data.head(10)
like image 671
avaj Avatar asked Jun 11 '16 16:06

avaj


2 Answers

One of the biggest problem that one faces and that undergoes unnoticed is that in the u.data file while inserting headers the separation should be exactly the same as the separation between a row of data. For example if a tab is used to separate a tuple then you should not use spaces.

In your u.data file add headers and separate them exactly with as many whitespaces as were used between the items of a row. PS: Use sublime text, notepad/notepad++ does not work sometimes.

like image 183
Aditya Avatar answered Sep 25 '22 04:09

Aditya


"ValueError: labels ['timestamp'] not contained in axis"

You don't have headers in the file, so the way you loaded it you got a df where the column names are the first rows of the data. You tried to access colunm timestamp which doesn't exist.

Your u.data doesn't have headers in it

$head u.data                   
196 242 3   881250949
186 302 3   891717742

So working with column names isn't going to be possible unless add the headers. You can add the headers to the file u.data, e.g. I opened it in a text editor and added the line a b c timestamp at the top of it (this seems to be a tab-separated file, so be careful when added the header not to use spaces, else it breaks the format)

$head u.data                   
a   b   c   timestamp
196 242 3   881250949
186 302 3   891717742

Now your code works and data.columns returns

Index([u'a', u'b', u'c', u'timestamp'], dtype='object')

And the rest of the trace of your working code is now

(100000, 4) # the shape
['a', 'b', 'c', 'timestamp'] # the columns
     a    b  c  timestamp # the df
0  196  242  3  881250949
1  186  302  3  891717742
2   22  377  1  878887116
3  244   51  2  880606923
4  166  346  1  886397596
5  298  474  4  884182806
6  115  265  2  881171488
7  253  465  5  891628467
8  305  451  3  886324817
9    6   86  3  883603013

If you don't want to add headers

Or you can drop the column 'timestamp' using it's index (presumably 3), we can do this using df.ix below it selects all rows, columns index 0 to index 2, thus dropping the column with index 3

data.ix[:, 0:2]
like image 23
bakkal Avatar answered Sep 22 '22 04:09

bakkal