Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access list of items with list of indices

Consider a large list of named items (first line) returned from a large csv file (80 MB) with possible interrupted spacing

name_line =  ['a',,'b',,'c' .... ,,'cb','cc']

I am reading the remainder of the data in line by line and I only need to process data with a corresponding name. Data might look like

data_line =  ['10',,'.5',,'10289' .... ,,'16.7','0']

I tried it two ways. One is popping the empty columns from each line of the read

blnk_cols = [1,3, ... ,97]
while data:
    ...
    for index in blnk_cols: data_line.pop(index)

the other is compiling the items associated with a name from L1

good_cols = [0,2,4, ... ,98,99]   
while data:
    ...
    data_line = [data_line[index] for index in good_cols]

in the data I am using there will definitely be more good lines then bad lines although it might be as high as half and half.

I used the cProfile and pstats package to determine my weakest links in speed which suggested the pop was the current slowest item. I switched to the list comp and the time almost doubled.

I imagine one fast way would be to slice the array retrieving only good data, but this would be complicated for files with alternating blank and good data.

what I really need is to be able to do

data_line = data_line[good_cols]

effectively passing a list of indices into a list to get back those items. Right now my program is running in about 2.3 seconds for a 10 MB file and the pop accounts for about .3 seconds.

Is there a faster way to access certain locations in a list. In C it would just be de-referencing an array of pointers to the correct indices in the array.

Additions: name_line in file before read

a,b,c,d,e,f,g,,,,,h,i,j,k,,,,l,m,n,

name_line after read and split(",")

['a','b','c','d','e','f','g','','','','','h','i','j','k','','','','l','m','n','\n']
like image 606
Paul Seeb Avatar asked Jan 25 '12 18:01

Paul Seeb


People also ask

How do you get an element from a list using its index?

So, you can grab any list element you want by using its index. To access an item, first include the name of the list and then in square brackets include the integer that corresponds to the index for the item you want to access.

How do you access elements in a list?

The syntax for accessing the elements of a list is the same as the syntax for accessing the characters of a string. We use the index operator ( [] – not to be confused with an empty list). The expression inside the brackets specifies the index.


1 Answers

Try a generator expression,

data_line = (data_line[i] for i in good_cols)

Also read here about Generator Expressions vs. List Comprehension

as the top answer tells you: 'Basically, use a generator expression if all you're doing is iterating once'.

So you should benefit from this.

like image 150
Johan Lundberg Avatar answered Oct 14 '22 16:10

Johan Lundberg