Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XLRD/Python: Reading Excel file into dict with for-loops

I'm looking to read in an Excel workbook with 15 fields and about 2000 rows, and convert each row to a dictionary in Python. I then want to append each dictionary to a list. I'd like each field in the top row of the workbook to be a key within each dictionary, and have the corresponding cell value be the value within the dictionary. I've already looked at examples here and here, but I'd like to do something a bit different. The second example will work, but I feel like it would be more efficient looping over the top row to populate the dictionary keys and then iterate through each row to get the values. My Excel file contains data from discussion forums and looks something like this (obviously with more columns):

id    thread_id    forum_id    post_time    votes    post_text 4     100          3           1377000566   1        'here is some text' 5     100          4           1289003444   0        'even more text here' 

So, I'd like the fields id, thread_id and so on, to be the dictionary keys. I'd like my dictionaries to look like:

{id: 4,  thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: 'here is some text'} 

Initially, I had some code like this iterating through the file, but my scope is wrong for some of the for-loops and I'm generating way too many dictionaries. Here's my initial code:

import xlrd from xlrd import open_workbook, cellname  book = open_workbook('forum.xlsx', 'r') sheet = book.sheet_by_index(3)  dict_list = []  for row_index in range(sheet.nrows):     for col_index in range(sheet.ncols):         d = {}          # My intuition for the below for-loop is to take each cell in the top row of the          # Excel sheet and add it as a key to the dictionary, and then pass the value of          # current index in the above loops as the value to the dictionary. This isn't         # working.          for i in sheet.row(0):            d[str(i)] = sheet.cell(row_index, col_index).value            dict_list.append(d) 

Any help would be greatly appreciated. Thanks in advance for reading.

like image 744
kylerthecreator Avatar asked May 09 '14 15:05

kylerthecreator


People also ask

How do you read all Excel files in a directory in Python?

To read all excel files in a directory, use the Glob module and the read_excel() method.


1 Answers

The idea is to, first, read the header into the list. Then, iterate over the sheet rows (starting from the next after the header), create new dictionary based on header keys and appropriate cell values and append it to the list of dictionaries:

from xlrd import open_workbook  book = open_workbook('forum.xlsx') sheet = book.sheet_by_index(3)  # read header values into the list     keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)]  dict_list = [] for row_index in xrange(1, sheet.nrows):     d = {keys[col_index]: sheet.cell(row_index, col_index).value           for col_index in xrange(sheet.ncols)}     dict_list.append(d)  print dict_list 

For a sheet containing:

A   B   C   D 1   2   3   4 5   6   7   8 

it prints:

[{'A': 1.0, 'C': 3.0, 'B': 2.0, 'D': 4.0},   {'A': 5.0, 'C': 7.0, 'B': 6.0, 'D': 8.0}] 

UPD (expanding the dictionary comprehension):

d = {} for col_index in xrange(sheet.ncols):     d[keys[col_index]] = sheet.cell(row_index, col_index).value  
like image 50
alecxe Avatar answered Oct 01 '22 05:10

alecxe