pass openpyxl data to pandas

Question

I am splitting "full name" fields into "first name", middle name" and "last name" fields from data from an excel file. I couldn't figure out how to do that in pandas, so I turned to openpyxl. I got the variables split as I desired. But, since adding columns to openpyxl for the new fields is not easy, I thought I would pass the values to pandas.

I'm generating the dataframe that I need when I run the code, but once I send the df to ExcelWriter, only the last row is added to the Excel file. The data is in the right places, though.

Here's the code:

for cellObj in range(2, sheet.max_row+1):
    #print cellObj
    id = sheet['A' + str(cellObj)].value
    fullname = sheet['B' + str(cellObj)].value.strip()
    namelist = fullname.split(' ')  
    for i in namelist:
        firstname = namelist[0]
        if len(namelist) == 2:
            lastname = namelist[1]
            middlename = ''
        elif len(namelist) == 3:
            middlename = namelist[1]
            lastname = namelist[2]
        elif len(namelist) == 4:
            middlename = namelist[1]
            lastname = namelist[2] + " " + namelist[3]
        if (namelist[1] == 'Del') | (namelist[1] == 'El') | (namelist[1] == 'Van'):
            middlename = ''
            lastname = namelist[1] + " " + namelist[2]
    df = pd.DataFrame({'personID':id,'lastName':lastname,'firstName':firstname,'middleName':middlename}, index=[id])

    writer = pd.ExcelWriter('output.xlsx')
    df.to_excel(writer,'Sheet1', columns=['ID','lastName','firstName','middleName'])
    writer.save()

Any ideas?

Thanks

Sam · Accepted Answer

A couple of things. First, your code is only ever going to get you one line, because you overwrite the values every time it passes an if test. for example,

  if len(namelist) == 2:
        lastname = namelist[1]

This assigns a string to the variable lastname. You are not appending to a list, you are just assigning a string. Then when you make your dataframe, df = pd.DataFrame({'personID':id,'lastName':lastname,... your using this value, so the dataframe will only ever hold that string. Make sense? If you must do this using openpyexcel, try something like:

lastname = [] #create an empty list
if len(namelist) == 2:
    lastname.append(namelist[1]) #add the name to the list

However, I think your life will ultimately be much easier if you just figure out how to do this with pandas. It is in fact quite easy. Try something like this:

import pandas as pd
#read excel
df = pd.read_excel('myInputFilename.xlsx', encoding = 'utf8')
#write to excel
df.to_excel('MyOutputFile.xlsx')

Charlie Clark · Answer

FWIW openpyxl 2.4 makes it pretty easy to convert all or part of an Excel sheet to a Pandas Dataframe: ws.values is an iterator for all that values in the sheet. It also has a new ws.iter_cols() method that will allow you to work directly with columns.

It's currently (April 2016) available as an alpha version and can be installed using pip install -U --pre openpyxl

The code would then look a bit like this:

sheet["B1"] = "firstName"
sheet["C1"] = "middleName"
sheet["D1"] = "lastName"

for row in sheet.iter_rows(min_row=2, max_col=2):
    id_cell, name = row

    fullname = name.value.strip()
    namelist = fullname.split()
    firstname = namelist[0]
    lastname = namelist[-1]
    middlename = ""
    if len(namelist) >= 3:
        middlename = namelist[1]
    if len(namelist) == 4:
        lastname = " ".join(namelist[-2:])
    if middlename in ('Del', 'El', 'Van', 'Da'):
        lastname = " ".join([middlename, lastname])
        middlename = None

    name.value = firstname
    name.offset(column=1).value = middlename
    name.offset(column=2).value = lastname

wb.save("output.xlsx")

pass openpyxl data to pandas

Tags:

python

pandas

excel

openpyxl

mattrweaver

2 Answers

Sam

Charlie Clark

Recent Activity

Donate For Us

pass openpyxl data to pandas

Tags:

python

pandas

excel

openpyxl

mattrweaver

2 Answers

Sam

Charlie Clark

Related questions

Recent Activity

Donate For Us