Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I concatenate multiple rows of excel data into one?

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

enter image description here

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.

I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.

I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.

import openpyxl
from openpyxl import Workbook

path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active

listab = []
rows = ws.max_row
columns = ws.max_column

for i in range (1, rows+1):
   listab.append([])

cellValue = " "    
prevCell = " "

for c in range (1, rows+1):
    for r in range(1, columns+1):
        cellValue = ws.cell(row=r, column=c).value
            if cellValue == prevCell:
                listab[r-1].append(prevCell)
            elif cellValue == "NULL":
                listab[r-1].append(prevCell)
            elif cellValue != prevCell:
                listab[r-1].append(cellValue)
                prevCell = cellValue           

for r in range(1, rows+1):
    for c in range (1, columns+1):
        j = ws2.cell(row = r, column=c)
        j.value = listab[r-1][c-1]

print(listab)

wb2.save("output.xlsx")

There should be one line with the below information:

ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |

like image 566
Mathieu Moquin Avatar asked Dec 07 '25 19:12

Mathieu Moquin


1 Answers

Personally I would go with pandas.

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True)  ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}

#Go through unique ids
for key in unique_ids:
    processed[key] = {}

    #Get related records
    matching_records = [y for y in records_list if y['ods_service_ids'] == key]
    #Loop through records
    for record in matching_records:
        #For each key to check, save in dict if non null
        processed[key]['ods_service_ids'] = key
        for detail_key in keys_to_check:
            if record[detail_key] != "NO DATA" :
                processed[key][detail_key] = record[detail_key]
        ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?

EDIT:

df could be exported in a different way. Replaced below #export to Excel with the following :

df.to_excel("output.xlsx",sheet_name='Sheet_name_1')

EDIT 2:

with no input data it was hard to see any flows. Corrected the code above with fake data

like image 145
Sebastien D Avatar answered Dec 09 '25 09:12

Sebastien D