I have a program which needs to turn many large one-dimensional numpy arrays of floats into delimited strings. I am finding this operation quite slow relative to the mathematical operations in my program and am wondering if there is a way to speed it up. For example, consider the following loop, which takes 100,000 random numbers in a numpy array and joins each array into a comma-delimited string.
import numpy as np x = np.random.randn(100000) for i in range(100): ",".join(map(str, x))
This loop takes about 20 seconds to complete (total, not each cycle). In contrast, consider that 100 cycles of something like elementwise multiplication (x*x) would take than one 1/10 of a second to complete. Clearly the string join operation creates a large performance bottleneck; in my actual application it will dominate total runtime. This makes me wonder, is there a faster way than ",".join(map(str, x))? Since map() is where almost all the processing time occurs, this comes down to the question of whether there a faster to way convert a very large number of numbers to strings.
Your answer is certainly faster than the method given in the question, but much slower than the best practices for numpy. It really is a clever way to merge pairs of arrays in the minimum number of operations, but concatenate accepts lists of any length so you aren't limited to pairs.
pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.
As array size gets close to 5,000,000, Numpy gets around 120 times faster. As the array size increases, Numpy is able to execute more parallel operations and making computation faster.
Create 1D Numpy Array using arange() function Numpy arange() function takes start, end of a range and the interval as arguments and returns a one-dimensional array. In this example, we will import numpy library and use arange() function to crate a one dimensional numpy array.
A little late, but this is faster for me:
#generate an array with strings x_arrstr = np.char.mod('%f', x) #combine to a string x_str = ",".join(x_arrstr)
Speed up is on my machine about 1.5x
Very good writeup on the performance of various string concatenation techniques in Python: http://www.skymind.com/~ocrow/python_string/
I'm a little surprised that some of the latter approaches perform as well as they do, but looks like you can certainly find something there that will work better for you than what you're doing there.
Fastest method mentioned on the site
Method 6: List comprehensions
def method6(): return ''.join([`num` for num in xrange(loop_count)])
This method is the shortest. I'll spoil the surprise and tell you it's also the fastest. It's extremely compact, and also pretty understandable. Create a list of numbers using a list comprehension and then join them all together. Couldn't be simpler than that. This is really just an abbreviated version of Method 4, and it consumes pretty much the same amount of memory. It's faster though because we don't have to call the list.append() function each time round the loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With