Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read CSV file with semicolon as delimiter

I have a numpy 2D array which is of the shape (4898, ) where elements in each row are separated by a semi-colon but are still stored in a single column and not multiple columns (the desired outcome). How do I create a split at each occurrence of a semi-colon in each array of the 2D array. I have written the following Python script to do so but it throws errors.

stochastic_gradient_descent_winequality.py

import numpy
import pandas

if __name__ == '__main__' :

    with open('winequality-white.csv', 'r') as f_0 :
        with open('winequality-white-updated.csv', 'w') as f_1 :
            f_0.next()
            for line in f_0 :
                f_1.write(line)


    wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
    wine_data_ = wine_data
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)

    print (numpy.shape(wine_data))

Errors

Traceback (most recent call last):
  File "stochastic_gradient_descent_winequality.py", line 16, in <module>
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'
like image 365
Abhijeet Mohanty Avatar asked May 26 '17 06:05

Abhijeet Mohanty


People also ask

How do I open a CSV file with a semicolon in Excel?

Navigate to the CSV file you wish to open and click Import. In the newly-opened window, choose Delimited. Then click Next. Check the box next to the type of delimiter: in most cases, this is either a semicolon or a comma.


2 Answers

If you're using semicolons (;) as your csv-file separator instead of commas (,), you can adjust that first line:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)

The problem with your list comprehension is that [x.split(';') for x in wine_data_] iterates over the column names.

That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))
like image 57
Arya McCarthy Avatar answered Oct 04 '22 13:10

Arya McCarthy


Suppose your csv file is like this:

2.12;5.12;3.12
3.1233;4;2
4;4.9696;3
2;5.0344;3
3.59595;4;2
4;4;3.59595
...

Then change your code like this:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_[0]], dtype = numpy.float)
wine_data

The wine_data will be:

array([[ 2.12   ,  5.12   ,  3.12   ],
       [ 3.1233 ,  4.     ,  2.     ],
       [ 4.     ,  4.9696 ,  3.     ],
       [ 2.     ,  5.0344 ,  3.     ],
       [ 3.59595,  4.     ,  2.     ],
       [ 4.     ,  4.     ,  3.59595]])

Be more efficient:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ';', header = None)
wine_data = numpy.array(wine_data,dtype = numpy.float)
like image 20
Tiny.D Avatar answered Oct 04 '22 14:10

Tiny.D