Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Effective way to get part of string until token

Tags:

python

string

I'm parsing a very big csv (big = tens of gigabytes) file in python and I need only the value of the first column of every line. I wrote this code, wondering if there is a better way to do it:

delimiter = ','
f = open('big.csv','r')
for line in f:
    pos = line.find(delimiter)
    id = int(line[0:pos])

Is there a more effective way to get the part of the string before the first delimiter?

Edit: I do know about the CSV module (and I have used it occasionally), but I do not need to load in memory every line of this file - I need the first column. So lets focus on string parsing.

like image 406
ddinchev Avatar asked Oct 12 '25 03:10

ddinchev


1 Answers

>>> a = '123456'
>>> print a.split('2', 1)[0]
1
>>> print a.split('4', 1)[0]
123
>>> 

But, if you're dealing with a CSV file, then:

import csv
with open('some.csv') as fin:
    for row in csv.reader(fin):
        print int(row[0])

And the csv module will handle quoted columns containing quotes etc...

like image 137
Jon Clements Avatar answered Oct 14 '25 15:10

Jon Clements