In the csv module in python, there is a function called csv.reader
which allows you to iterate over a row, returns a reader object and can be held in a container like a list.
So when the list assigned to a variable and is printed, ie:
csv_rows = list(csv.reader(csvfile, delimiter=',', quotechar='|'))
print (csv_rows)
>
>
>
[['First Name', 'Last Name', 'Zodicac', 'Date of birth', 'Sex'] # I gave an example of the function outputting a header row
So far, I don't see a similar function like this in the openpyxl. I could be mistaken so I'm wondering if any of you can help me out.
Update
@alecxe, your solution works perfectly (except its casting my date of birth as a datetime format instead of a regular string).
def iter_rows(ws):
for row in ws.iter_rows():
yield [cell.value for cell in row]
>
>
>>> pprint(list(iter_rows(ws)))
[['First Nam', 'Last Name', 'Zodicac', 'Date of birth', 'Sex'], ['John', 'Smith', 'Snake', datetime.datetime(1989, 9, 4, 0, 0), 'M']]
Since I'm a beginner I wanted to know how this would work if I used a for loop instead of a list comprehension.
So I used this:
def iter_rows(ws):
result=[]
for row in ws.iter_rows()
for cell in row:
result.append(cell.value)
yield result
It almost gives me the exact same output, instead it gives me this: As you can tell, it essentially gives me one gigantic list instead of nested list in the result you gave me.
>>>print(list(iter_rows(ws)))
[['First Nam', 'Last Name', 'Zodicac', 'Date of birth', 'Sex', 'David', 'Yao', 'Snake', datetime.datetime(1989, 9, 4, 0, 0), 'M']]
In the csv module in python, there is a function called csv. reader which allows you to iterate over a row, returns a reader object and can be held in a container like a list.
ws. max_row will give you the number of rows in a worksheet. Since version openpyxl 2.4 you can also access individual rows and columns and use their length to answer the question. Though it's worth noting that for data validation for a single column Excel uses 1:1048576 .
Developers describe openpyxl as "A Python library to read/write Excel 2010 xlsx/xlsm files". A Python library to read/write Excel 2010 xlsx/xlsm files. On the other hand, pandas is detailed as "Powerful data structures for data analysis".
iter_rows()
has probably a similar sense:
Returns a squared range based on the range_string parameter, using generators. If no range is passed, will iterate over all cells in the worksheet
>>> from openpyxl import load_workbook
>>>
>>> wb = load_workbook('test.xlsx')
>>> ws = wb.get_sheet_by_name('Sheet1')
>>>
>>> pprint(list(ws.iter_rows()))
[(<Cell Sheet1.A1>,
<Cell Sheet1.B1>,
<Cell Sheet1.C1>,
<Cell Sheet1.D1>,
<Cell Sheet1.E1>),
(<Cell Sheet1.A2>,
<Cell Sheet1.B2>,
<Cell Sheet1.C2>,
<Cell Sheet1.D2>,
<Cell Sheet1.E2>),
(<Cell Sheet1.A3>,
<Cell Sheet1.B3>,
<Cell Sheet1.C3>,
<Cell Sheet1.D3>,
<Cell Sheet1.E3>)]
You can modify it a little bit to yield a list of row values, for example:
def iter_rows(ws):
for row in ws.iter_rows():
yield [cell.value for cell in row]
Demo:
>>> pprint(list(iter_rows(ws)))
[[1.0, 1.0, 1.0, None, None],
[2.0, 2.0, 2.0, None, None],
[3.0, 3.0, 3.0, None, None]]
I got it to work using this method:
all_rows = []
for row in worksheet:
current_row = []
for cell in row:
current_row.append(cell.value)
all_rows.append(current_row)
Essentially, I created a list for all of the data.
Then, I iterated through each row in the worksheet.
Each cell.value
within a row was added to a short-term list (current row).
Once all of the cell.values
within the row are added to the short-term list, the short-term list is added to the long-term list.
After loading the workbook using your specified file path and choosing a worksheet, you may use a list comprehension for gathering each row by using ws.iter_rows
and supplying it with the value of values_only=True
, which will return a tuple for each row of the Excel file containing the values for each cell. This tuple can then be converted to a list, ultimately returning a two-dimensional list.
import openpyxl as opxl
# load the workbook
wb = opxl.load_workbook(file_path)
# choose the worksheet from the excel file
# you may choose the currently active sheet
ws = wb.active
# you may choose to specify a sheet
ws = wb["example_sheet"]
# return a list of lists, each sub list within the
# 2-dimensional list being a record from within the excel file.
return [list(r) for r in ws.iter_rows(values_only=True)]
My example was some code I used for dealing with Excel files, not CSV, but perhaps the process may be similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With