Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using numpy to filter out multiple comment symbols

Tags:

python

numpy

I am looking for a way to pull data from a file that has multiple comment symbols. The input file looks similar to:

# filename: sample.txt
# Comment 1
# Comment 2
$ Comment 3
1,10
2,20
3,30
4,40
# Comment 4

I can only seem to remove one comment type with the following code and can't find any documentation on how I might remove both.

import numpy as np
data = np.loadtxt('sample.txt',comments="#") # I need to also filter out '$'

Are there any alternative methods I could use to accomplish this?

like image 865
tirefire Avatar asked Jun 18 '14 07:06

tirefire


3 Answers

Simply use a list for comments, for example:

data = np.loadtxt('sample.txt',comments=['#', '$', '@'])
like image 176
Vladas O. Avatar answered Sep 23 '22 09:09

Vladas O.


I would create a generator that will ignore the comments and then pass it to np.genfromtxt():

gen = (r for r in open('sample.txt') if not r[0] in ('$', '#'))
a = np.genfromtxt(gen, delimiter=',')
like image 32
Saullo G. P. Castro Avatar answered Sep 24 '22 09:09

Saullo G. P. Castro


for this case, you need to resort to standard-python looping over the input, e.g. something like this:

data = []
with open("input.txt") as fd:
    for line in fd:
        if line.startswith('#') or line.startswith('$'):
            continue
        data.append(map(int, line.strip().split(',')))

print data

output:

[[1, 10], [2, 20], [3, 30], [4, 40]]
like image 26
Fredrik Pihl Avatar answered Sep 26 '22 09:09

Fredrik Pihl