Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python csv reader not reading all rows

So I've got about 5008 rows in a CSV file, a total of 5009 with the headers. I'm creating and writing this file all within the same script. But when i read it at the end, with either pandas pd.read_csv, or python3's csv module, and print the len, it outputs 4967. I checked the file for any weird characters that may be confusing python but don't see any. All the data is delimited by commas.

I also opened it in sublime and it shows 5009 rows not 4967.

I could try other methods from pandas like merge or concat, but if python wont read the csv correct, that's no use.

This is one method i tried.

df1=pd.read_csv('out.csv',quoting=csv.QUOTE_NONE, error_bad_lines=False)
df2=pd.read_excel(xlsfile)

print (len(df1))#4967
print (len(df2))#5008

df2['Location']=df1['Location']
df2['Sublocation']=df1['Sublocation']
df2['Zone']=df1['Zone']
df2['Subnet Type']=df1['Subnet Type']
df2['Description']=df1['Description']

newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df2.to_csv(newfile, index=False)
print('Done.')

target.close()

Another way I tried is

dfcsv = pd.read_csv('out.csv')

wb = xlrd.open_workbook(xlsfile)
ws = wb.sheet_by_index(0)
xlsdata = []
for rx in range(ws.nrows):
    xlsdata.append(ws.row_values(rx))

print (len(dfcsv))#4967
print (len(xlsdata))#5009

df1 = pd.DataFrame(data=dfcsv)
df2 = pd.DataFrame(data=xlsdata)

df3 = pd.concat([df2,df1], axis=1)

newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df3.to_csv(newfile, index=False)    
print('Done.')

target.close()

But not matter what way I try the CSV file is the actual issue, python is writing it correctly but not reading it correctly.

Edit: Weirdest part is that i'm getting absolutely no encoding errors or any errors when running the code...

Edit2: Tried testing it with nrows param in first code example, works up to 4000 rows. Soon as i specify 5000 rows, it reads only 4967.

Edit3: manually saved csv file with my data instead of using the one written by the program, and it read 5008 rows. Why is python not writing the csv file correctly?

like image 837
DaVinci Avatar asked Aug 09 '16 14:08

DaVinci


People also ask

How do I read all lines in a csv file in Python?

Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.


Video Answer


1 Answers

I ran into this issue also. I realized that some of my lines had open-ended quotes, which was for some reason interfering with the reader.

So for example, some rows were written as:

GO:0000026  molecular_function  "alpha-1
GO:0000027  biological_process  ribosomal large subunit assembly
GO:0000033  molecular_function  "alpha-1

and this led to rows being read incorrectly. (Unfortunately I don't know enough about how csvreader works to tell you why. Hopefully someone can clarify the quote behavior!)

I just removed the quotes and it worked out.

Edited: This option works too, if you want to maintain the quotes:

quotechar=None
like image 59
Anna Pamela Calinawan Avatar answered Oct 12 '22 19:10

Anna Pamela Calinawan