I have a large table (numbers in text format) that I would like to load with numpy.genfromtxt()
. I would like to ignore the first n columns, say 5. I do not know the size of the table (number of row or columns) in advance.
I saw that genfromtxt()
has an option skip_header
that allows to skip a specified number of header rows, but it seems there is no such option for columns. There is a usecols
option but there I must specify the column numbers I want to keep, rather than those I want to discard (I do not know this number in advance).
Obviously I could just load the whole thing and then throw away the first n columns, but this is not elegant and is wasteful in terms of memory.
Also I could peak into the file, find the number of columns, and then construct the usecols
argument, but that is rather messy.
Any ideas on how to solve this elegantly? Is there some hidden/undocumented argument that I can use?
genfromtxt. Load data from a text file, with missing values handled as specified. Each line past the first skip_header lines is split at the delimiter character, and characters following the comments character are discarded.
The only mandatory argument of genfromtxt is the source of the data. It can be a string, a list of strings, a generator or an open file-like object with a read method, for example, a file or io. StringIO object.
delimiter : The string used to separate values. By default, this is any whitespace. converters : A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string: converters = {0: datestr2num}.
For older versions of numpy, peeking at the first line to discover the number of columns is not that hard:
import numpy as np
with open(fname, 'r') as f:
num_cols = len(f.readline().split())
f.seek(0)
data = np.genfromtxt(f, usecols = range(5,num_cols))
print(data)
In newer versions of Numpy, np.genfromtxt
can take an iterable argument, so you can wrap the file you're reading in a generator that generates lines, skipping the first N
columns. If your numbers are space-separated, that's something like
np.genfromtxt(" ".join(ln.split()[N:]) for ln in f)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With