I have a program that reads in data files and the user selects which column they want to use. I want it to be more universal with input files; sometimes, columns can look like this: <pre class="prettyprint"><code>10:34:24.58 8.284 6.121 </code></pre> And sometimes they can look like this: <pre class="prettyprint"><code>10 34 24.58 8.284 6.121 </code></pre> I want the program to recognize this as 5 columns in BOTH cases, instead of 5 columns for the first and 3 for the second. Basically, I want it to recognize <code>white space</code> as a delimiter and <code>:</code> as a delimiter as well. Is there a simple way to do this? I know numpy takes a delimiter command, but as far as I'm aware it can only use one.

<code>np.loadtxt</code> (and <code>genfromtxt</code>) accept any iterable as input as long as it feeds it one line at a time. So the lines of your file can be passed through a function or generator that massages it in various ways. Here's a simple example Define a pair of lines that simulates your file: <pre class="prettyprint"><code>In [7]: txt="""10:34:24.58 8.284 6.121 ...: 10 34 24.58 8.284 6.121 ...: """ In [8]: txt=txt.splitlines() In [9]: txt Out[9]: ['10:34:24.58 8.284 6.121', '10 34 24.58 8.284 6.121'] </code></pre> If it weren't for the <code>:</code> I could pass this directly to <code>loadtxt</code>. But let's pass the lines through a generator that replaces the ':' with a space. It could be a function (with yield). Here I'm using one of those new-fangled generator comprehension expressions: <pre class="prettyprint"><code>In [10]: np.loadtxt((x.replace(b':',b' ') for x in txt)) Out[10]: array([[ 10. , 34. , 24.58 , 8.284, 6.121], [ 10. , 34. , 24.58 , 8.284, 6.121]]) </code></pre> with a file, this should work (iterating on an open file returns lines): <pre class="prettyprint"><code>with open(filename) as f: A=np.loadtxt((x.replace(b':',b' ') for x in f)) </code></pre> regex would be useful for more elaborate replacements.

Making np.loadtxt work with multiple possible delimiters

Tags:

python

numpy

I have a program that reads in data files and the user selects which column they want to use. I want it to be more universal with input files; sometimes, columns can look like this:

10:34:24.58  8.284  6.121

And sometimes they can look like this:

10 34 24.58  8.284  6.121

I want the program to recognize this as 5 columns in BOTH cases, instead of 5 columns for the first and 3 for the second. Basically, I want it to recognize white space as a delimiter and : as a delimiter as well.

Is there a simple way to do this? I know numpy takes a delimiter command, but as far as I'm aware it can only use one.

681

asked Dec 04 '15 20:12

uhurulol

1 Answers

np.loadtxt (and genfromtxt) accept any iterable as input as long as it feeds it one line at a time. So the lines of your file can be passed through a function or generator that massages it in various ways. Here's a simple example

Define a pair of lines that simulates your file:

In [7]: txt="""10:34:24.58  8.284  6.121
   ...: 10 34 24.58  8.284  6.121
   ...: """

In [8]: txt=txt.splitlines()

In [9]: txt
Out[9]: ['10:34:24.58  8.284  6.121', '10 34 24.58  8.284  6.121']

If it weren't for the : I could pass this directly to loadtxt.

But let's pass the lines through a generator that replaces the ':' with a space. It could be a function (with yield). Here I'm using one of those new-fangled generator comprehension expressions:

In [10]: np.loadtxt((x.replace(b':',b' ') for x in txt))
Out[10]: 
array([[ 10.   ,  34.   ,  24.58 ,   8.284,   6.121],
       [ 10.   ,  34.   ,  24.58 ,   8.284,   6.121]])

with a file, this should work (iterating on an open file returns lines):

with open(filename) as f:
    A=np.loadtxt((x.replace(b':',b' ') for x in f))

regex would be useful for more elaborate replacements.

answered Sep 23 '22 01:09

hpaulj

Related questions
                            
                                python 2.7: no module named configparser
                            
                                efficient way to get words before and after substring in text (python)
                            
                                pandas: for each row in df copy row N times with slight changes
                            
                                HTTPSHandler error while installing pip with python 2.7.9
                            
                                Comparison operators and 'is' - operator precedence in python?
                            
                                How to use Pearson Correlation as distance metric in Scikit-learn Agglomerative clustering
                            
                                Certain Power of Sum of Digits of N == N (running too slowly)
                            
                                Displaying image without waitKey
                            
                                Python Windows 7 - Installation Fail 0x80240017
                            
                                Round floats down in Python to keep one non-zero decimal only
                            
                                Debug cython code (.pyx) when using the python debugger (pdb) - Best Practice
                            
                                How to get easy_install to ignore certifcate
                            
                                Find/extract a sequence of integers within a list in python
                            
                                Python Exception Safe Pickle Use
                            
                                Generator of evenly spaced points in a circle in python
                            
                                How to download a file over HTTP with multi-thread (asynchronous download) using Python 2.7
                            
                                Django DB Models F Combined Expression
                            
                                Using argparse with function that takes **kwargs argument
                            
                                remove item from list according to item's special attribute [duplicate]
                            
                                asyncio's call_later raises 'generator' object is not callable with coroutine object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With