I have a csv
file with 3 columns emotion, pixels, Usage
consisting of 35000
rows e.g. 0,70 23 45 178 455,Training
.
I used pandas.read_csv
to read the csv
file as pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':np.int32, 'Usage':str})
.
When I try the above, it says ValueError: invalid literal for long() with base 10: '70 23 45 178 455'
? How do i read the pixels columns as a numpy
array?
Please try the below code instead -
df = pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':str, 'Usage':str})
def makeArray(text):
return np.fromstring(text,sep=' ')
df['pixels'] = df['pixels'].apply(makeArray)
It will be faster I believe to use the vectorised str
method to split the string and create the new pixel columns as desired and concat
the new columns to the new df:
In [175]:
# load the data
import pandas as pd
import io
t="""emotion,pixels,Usage
0,70 23 45 178 455,Training"""
df = pd.read_csv(io.StringIO(t))
df
Out[175]:
emotion pixels Usage
0 0 70 23 45 178 455 Training
In [177]:
# now split the string and concat column-wise with the orig df
df = pd.concat([df, df['pixels'].str.split(expand=True).astype(int)], axis=1)
df
Out[177]:
emotion pixels Usage 0 1 2 3 4
0 0 70 23 45 178 455 Training 70 23 45 178 455
If you specifically want a flat np array you can just call the .values
attribute:
In [181]:
df['pixels'].str.split(expand=True).astype(int).values
Out[181]:
array([[ 70, 23, 45, 178, 455]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With