Read CSV file with semicolon as delimiter

Tags:

I have a numpy 2D array which is of the shape (4898, ) where elements in each row are separated by a semi-colon but are still stored in a single column and not multiple columns (the desired outcome). How do I create a split at each occurrence of a semi-colon in each array of the 2D array. I have written the following Python script to do so but it throws errors.

stochastic_gradient_descent_winequality.py

import numpy
import pandas

if __name__ == '__main__' :

    with open('winequality-white.csv', 'r') as f_0 :
        with open('winequality-white-updated.csv', 'w') as f_1 :
            f_0.next()
            for line in f_0 :
                f_1.write(line)


    wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
    wine_data_ = wine_data
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)

    print (numpy.shape(wine_data))

Errors

Traceback (most recent call last):
  File "stochastic_gradient_descent_winequality.py", line 16, in <module>
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'

365

asked May 26 '17 06:05

Abhijeet Mohanty

2 Answers

If you're using semicolons (;) as your csv-file separator instead of commas (,), you can adjust that first line:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)

The problem with your list comprehension is that [x.split(';') for x in wine_data_] iterates over the column names.

That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))

answered Oct 04 '22 13:10

Arya McCarthy

Suppose your csv file is like this:

2.12;5.12;3.12
3.1233;4;2
4;4.9696;3
2;5.0344;3
3.59595;4;2
4;4;3.59595
...

Then change your code like this:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_[0]], dtype = numpy.float)
wine_data

The wine_data will be:

array([[ 2.12   ,  5.12   ,  3.12   ],
       [ 3.1233 ,  4.     ,  2.     ],
       [ 4.     ,  4.9696 ,  3.     ],
       [ 2.     ,  5.0344 ,  3.     ],
       [ 3.59595,  4.     ,  2.     ],
       [ 4.     ,  4.     ,  3.59595]])

Be more efficient:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ';', header = None)
wine_data = numpy.array(wine_data,dtype = numpy.float)

answered Oct 04 '22 14:10

Tiny.D

Related questions
                            
                                Pandas: How to do a boxplot bases in rows values instead of column values?
                            
                                aws CLI unable to be used due to module colorama
                            
                                sqlalchemy table schema autoload
                            
                                Python pandas -> select by condition in columns name
                            
                                How can I use psycopg2.extras in sqlalchemy?
                            
                                Sum of previous rows values
                            
                                Change table to tall format using panda (UNPIVOT)
                            
                                How can i plot a Kmeans text clustering result with matplotlib?
                            
                                H2O Python - how to get variable types, getTypes equivalent
                            
                                Setting the interval of x-axis for seaborn plot
                            
                                Is there a reset_index for columns or a way to move column headers to an inner index leaving their index positions as the outer index?
                            
                                Subtract pandas columns from a specified column
                            
                                Upload file with Selenium Webdriver Python
                            
                                How to iterate over pandas DataFrameGroupBy and select all entries per grouped variable for specific column?
                            
                                Text Language detection in python
                            
                                How to open interactive python console by default?
                            
                                Getting an Ebay OAuth Token
                            
                                Pydub export error - Choose encoder manually
                            
                                does not show icons
                            
                                Can't use matplotlib.use('Agg'), graphs always show on the screen

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read CSV file with semicolon as delimiter

Tags:

python

arrays

numpy

Abhijeet Mohanty

People also ask

2 Answers

Arya McCarthy

Tiny.D

Recent Activity

Donate For Us