I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.
So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}
#Go through unique ids
for key in unique_ids:
processed[key] = {}
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
with no input data it was hard to see any flows. Corrected the code above with fake data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With