Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to generate delimited string from 1d numpy array

Tags:

python

numpy

I have a program which needs to turn many large one-dimensional numpy arrays of floats into delimited strings. I am finding this operation quite slow relative to the mathematical operations in my program and am wondering if there is a way to speed it up. For example, consider the following loop, which takes 100,000 random numbers in a numpy array and joins each array into a comma-delimited string.

import numpy as np x = np.random.randn(100000) for i in range(100):     ",".join(map(str, x)) 

This loop takes about 20 seconds to complete (total, not each cycle). In contrast, consider that 100 cycles of something like elementwise multiplication (x*x) would take than one 1/10 of a second to complete. Clearly the string join operation creates a large performance bottleneck; in my actual application it will dominate total runtime. This makes me wonder, is there a faster way than ",".join(map(str, x))? Since map() is where almost all the processing time occurs, this comes down to the question of whether there a faster to way convert a very large number of numbers to strings.

like image 978
Abiel Avatar asked Apr 27 '10 13:04

Abiel


People also ask

Is NumPy concatenate fast?

Your answer is certainly faster than the method given in the question, but much slower than the best practices for numpy. It really is a clever way to merge pairs of arrays in the minimum number of operations, but concatenate accepts lists of any length so you aren't limited to pairs.

What is faster than NumPy?

pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.

Is NumPy faster than array?

As array size gets close to 5,000,000, Numpy gets around 120 times faster. As the array size increases, Numpy is able to execute more parallel operations and making computation faster.

Which is the correct way to create NumPy 1D array?

Create 1D Numpy Array using arange() function Numpy arange() function takes start, end of a range and the interval as arguments and returns a one-dimensional array. In this example, we will import numpy library and use arange() function to crate a one dimensional numpy array.


2 Answers

A little late, but this is faster for me:

#generate an array with strings x_arrstr = np.char.mod('%f', x) #combine to a string x_str = ",".join(x_arrstr) 

Speed up is on my machine about 1.5x

like image 59
Markus R Avatar answered Oct 16 '22 17:10

Markus R


Very good writeup on the performance of various string concatenation techniques in Python: http://www.skymind.com/~ocrow/python_string/

I'm a little surprised that some of the latter approaches perform as well as they do, but looks like you can certainly find something there that will work better for you than what you're doing there.

Fastest method mentioned on the site

Method 6: List comprehensions

def method6():   return ''.join([`num` for num in xrange(loop_count)]) 

This method is the shortest. I'll spoil the surprise and tell you it's also the fastest. It's extremely compact, and also pretty understandable. Create a list of numbers using a list comprehension and then join them all together. Couldn't be simpler than that. This is really just an abbreviated version of Method 4, and it consumes pretty much the same amount of memory. It's faster though because we don't have to call the list.append() function each time round the loop.

like image 21
sblom Avatar answered Oct 16 '22 15:10

sblom