I'm trying to do an unbelievably simple thing: load parts of an Excel worksheet into a Numpy array. I've found a kludge that works, but it is embarrassingly unpythonic: say my worksheet was loaded as "ws", the code:
A = np.zeros((37,3))
for i in range(2,39):
for j in range(1,4):
A[i-2,j-1]= ws.cell(row = i, column = j).value
loads the contents of "ws" into array A.
There MUST be a more elegant way to do this. For instance, csvread allows to do this much more naturally, and while I could well convert the .xlsx file into a csv one, the whole purpose of working with openpyxl was to avoid that conversion. So there we are, Collective Wisdom of the Mighty Intertubes: what's a more pythonic way to perform this conceptually trivial operation?
Thank you in advance for your answers.
PS: I operate Python 2.7.5 on a Mac via Spyder, and yes, I did read the openpyxl tutorial, which is the only reason I got this far.
Python NumPy read CSV into 2d NumPy arraytxt() and open() functions to load a CSV file into a 2Dimension NumPy Array. Call open file to open the CSV text file Use numpy. loadtxt( CSV file, delimiter) with the file as the result of the previous step and delimiter as “,” to return the data in a two-dimensional NumPy.
You could do
A = np.array([[i.value for i in j] for j in ws['C1':'E38']])
EDIT - further explanation. (firstly thanks for introducing me to openpyxl, I suspect I will use it quite a bit from time to time)
list(ws['C1':'E38'])
or a list comprehension as aboveThe advantages of doing it my way are: no need to work out the dimension of the array and make an empty one to start with, no need to work out the corrected index number of the np array, list comprehensions faster. Disadvantage is that it needs the "corners" defining in "A1" format. If the range isn't know then you would have to use iter_rows, rows or columns
A = np.array([[i.value for i in j[2:5]] for j in ws.rows])
if you don't know how many columns then you will have to loop and check values more like your original idea
If you don't need to load data from multiple files in an automated manner, the package tableconvert
I recently wrote may help. Just copy and paste the relevant cells from the excel file into a multiline string and use the convert()
function.
import numpy as np
from tableconvert.converter import convert
array = convert("""
123 456 3.14159
SOMETEXT 2,71828 0
""")
print(type(array))
print(array)
Output:
<class 'numpy.ndarray'>
[[ 123. 456. 3.14159]
[ nan 2.71828 0. ]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With