Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making np.loadtxt work with multiple possible delimiters

Tags:

python

numpy

I have a program that reads in data files and the user selects which column they want to use. I want it to be more universal with input files; sometimes, columns can look like this:

10:34:24.58  8.284  6.121

And sometimes they can look like this:

10 34 24.58  8.284  6.121

I want the program to recognize this as 5 columns in BOTH cases, instead of 5 columns for the first and 3 for the second. Basically, I want it to recognize white space as a delimiter and : as a delimiter as well.

Is there a simple way to do this? I know numpy takes a delimiter command, but as far as I'm aware it can only use one.

like image 681
uhurulol Avatar asked Dec 04 '15 20:12

uhurulol


People also ask

What does Loadtxt () do in Numpy?

Python NumPy loadtxt() function is used to load the data from a text file and store them in a ndarray. The purpose of loadtxt() function is to be a fast reader for simple text files. Each row in the text file must have the same number of values.

What parameter within Numpy Loadtxt can be used to skip a row?

Use help(np. loadtxt) . You'll find the skiprows parameter will allow you to skip the first N rows: In [1]: import numpy as np In [2]: help(np.

What is the default datatype that NP Loadtxt () uses for numbers?

Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.

What is unpack true in Python?

fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...) .


1 Answers

np.loadtxt (and genfromtxt) accept any iterable as input as long as it feeds it one line at a time. So the lines of your file can be passed through a function or generator that massages it in various ways. Here's a simple example

Define a pair of lines that simulates your file:

In [7]: txt="""10:34:24.58  8.284  6.121
   ...: 10 34 24.58  8.284  6.121
   ...: """

In [8]: txt=txt.splitlines()

In [9]: txt
Out[9]: ['10:34:24.58  8.284  6.121', '10 34 24.58  8.284  6.121']

If it weren't for the : I could pass this directly to loadtxt.

But let's pass the lines through a generator that replaces the ':' with a space. It could be a function (with yield). Here I'm using one of those new-fangled generator comprehension expressions:

In [10]: np.loadtxt((x.replace(b':',b' ') for x in txt))
Out[10]: 
array([[ 10.   ,  34.   ,  24.58 ,   8.284,   6.121],
       [ 10.   ,  34.   ,  24.58 ,   8.284,   6.121]])

with a file, this should work (iterating on an open file returns lines):

with open(filename) as f:
    A=np.loadtxt((x.replace(b':',b' ') for x in f))

regex would be useful for more elaborate replacements.

like image 66
hpaulj Avatar answered Sep 23 '22 01:09

hpaulj