Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort Excel sheet using Python

Tags:

python

xlrd

I am using Python 3.4 and xlrd. I want to sort the Excel sheet based on the primary column before processing it. Is there any library to perform this ?

like image 600
Ree Avatar asked Dec 14 '14 12:12

Ree


People also ask

How do I sort numbers in Excel using Python?

xlsx") df = xl. parse("Sheet1") df = df. sort_values(by="Header Row") ... The sort function is now called sort_by and columns is replaced by by .

How do you sort multiple columns in Excel using Python?

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.


2 Answers

There are a couple ways to do this. The first option is to utilize xlrd, as you have this tagged. The biggest downside to this is that it doesn't natively write to XLSX format.

These examples use an excel document with this format:

Text Excel Layout

Utilizing xlrd and a few modifications from this answer:

import xlwt
from xlrd import open_workbook

target_column = 0     # This example only has 1 column, and it is 0 indexed

book = open_workbook('test.xlsx')
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in xrange(sheet.nrows)]
labels = data[0]    # Don't sort our headers
data = data[1:]     # Data begins on the second row
data.sort(key=lambda x: x[target_column])

bk = xlwt.Workbook()
sheet = bk.add_sheet(sheet.name)

for idx, label in enumerate(labels):
     sheet.write(0, idx, label)

for idx_r, row in enumerate(data):
    for idx_c, value in enumerate(row):
        sheet.write(idx_r+1, idx_c, value)

bk.save('result.xls')    # Notice this is xls, not xlsx like the original file is

This outputs the following workbook:

XLRD output


Another option (and one that can utilize XLSX output) is to utilize pandas. The code is also shorter:

import pandas as pd

xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort(columns="Header Row")

writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,sheet_name='Sheet1',columns=["Header Row"],index=False)
writer.save()

This outputs:

Pandas Output

In the to_excel call, the index is set to False, so that the Pandas dataframe index isn't included in the excel document. The rest of the keywords should be self explanatory.

like image 68
Andy Avatar answered Oct 10 '22 14:10

Andy


I just wanted to refresh the answer as the Pandas implementation has changed a bit over time. Here's the code that should work now (pandas 1.1.2).

import pandas as pd

xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort_values(by="Header Row")
...

The sort function is now called sort_by and columns is replaced by by.

like image 38
akshayranganath Avatar answered Oct 10 '22 15:10

akshayranganath