Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Got 1 columns instead of ..." error in numpy

I'm working on the following code for performing Random Forest Classification on train and test sets;

from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt, savetxt

def main():
    dataset = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')   
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    test = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')

    rf = RandomForestClassifier(n_estimators=100)
    rf.fit(train, target)
    predicted_probs = [[index + 1, x[1]] for index, x in enumerate(rf.predict_proba(test))]

    savetxt('filepath', predicted_probs, delimiter=',', fmt='%d,%f', 
            header='Id,PredictedProbability', comments = '')

if __name__=="__main__":
    main()

However I get the following error on execution;

---->      dataset = genfromtxt(open('C:/Users/user/Desktop/pgm/Cora/a_train.csv','r'), delimiter='', dtype='f8')

ValueError: Some errors were detected !
    Line #88 (got 1435 columns instead of 1434)
    Line #93 (got 1435 columns instead of 1434)
    Line #164 (got 1435 columns instead of 1434)
    Line #169 (got 1435 columns instead of 1434)
    Line #524 (got 1435 columns instead of 1434)
...
...
...

Any suggestions as to how avoid it?? Thanks.

like image 519
user3466132 Avatar asked Apr 29 '14 00:04

user3466132


7 Answers

genfromtxt will give this error if the number of columns is unequal.

I can think of 3 ways around it:

1. Use the usecols parameter

np.genfromtxt('yourfile.txt',delimiter=',',usecols=np.arange(0,1434))

However - this may mean that you lose some data (where rows are longer than 1434 columns) - whether or not that matters is down to you.

2. Adjust your input data file so that it has an equal number of columns.

3. Use something other than genfromtxt:

.............like this

like image 127
atomh33ls Avatar answered Nov 14 '22 03:11

atomh33ls


An exception is raised if an inconsistency is detected in the number of columns.A number of reasons and solutions are possible.

  1. Add invalid_raise = False to skip the offending lines.

    dataset = genfromtxt(open('data.csv','r'), delimiter='', invalid_raise = False)

  2. If your data contains Names, make sure that the field name doesn’t contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter.

  1. deletechars

    Gives a string combining all the characters that must be deleted from the name. By default, invalid characters are ~!@#$%^&*()-=+~\|]}[{';: /?.>,<.

  2. excludelist

    Gives a list of the names to exclude, such as return, file, print… If one of the input name is part of this list, an underscore character ('_') will be appended to it.

  3. case_sensitive

    Whether the names should be case-sensitive (case_sensitive=True), converted to upper case (case_sensitive=False or case_sensitive='upper') or to lower case (case_sensitive='lower').

data = np.genfromtxt("data.txt", dtype=None, names=True,\
       deletechars="~!@#$%^&*()-=+~\|]}[{';: /?.>,<.", case_sensitive=True)

Reference: numpy.genfromtxt

like image 27
zeeshan khan Avatar answered Nov 14 '22 02:11

zeeshan khan


You have too many columns in one of your rows. For example

>>> import numpy as np
>>> from StringIO import StringIO
>>> s = """
... 1 2 3 4
... 1 2 3 4 5
... """
>>> np.genfromtxt(StringIO(s),delimiter=" ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/numpy/lib/npyio.py", line 1654, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 5 columns instead of 4)
like image 4
user545424 Avatar answered Nov 14 '22 04:11

user545424


In my case, the error aroused due to having a special symbol in the row.

Error cause: having special characters like

  • '#' hash
  • ',' given the fact that your ( delimiter = ',' )

Example csv file

  • 1,hello,#this,fails
  • 1,hello,',this',fails

    -----CODE-----

    import numpy as numpy data = numpy.genfromtxt(file, delimiter=delimeter) #Error

Environment Note:

OS: Ubuntu

csv editor: LibreOffice

IDE: Pycharm

like image 4
hemant c Avatar answered Nov 14 '22 02:11

hemant c


None of the previous answers worked for me so for future googlers here is another one :

Error was : "Line #88 (got 1435 columns instead of 1)"

Discovered that my csv file was a utf8 encoded text file with a BOM(a character marking the encoding on the first line of the file. Most text editors will hide this character)

I simply opened it in notepad in windows,"saved as" again and selected "ANSI" at the bottom of the save box.

Fixed it for me.

like image 2
Yahel Avatar answered Nov 14 '22 02:11

Yahel


I had this error. The cause was a single entry in my data that had a space. This caused it to see it as an extra row. Make sure all spacing is consistent throughout all the data.

like image 1
Jonathon D Avatar answered Nov 14 '22 04:11

Jonathon D


It seems like the header that includes the column names have 1 more column than the data itself (1435 columns on header vs. 1434 on data).

You could either:

1) Eliminate 1 column from the header that doesn't make sense with data

OR

2) Use the skip header from genfromtxt() for example, np.genfromtxt('myfile', skip_header=*how many lines to skip*, delimiter=' ') more information found in the documentation.

like image 1
tadf2 Avatar answered Nov 14 '22 03:11

tadf2