Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Excel file without hidden columns in Python using Pandas or other modules

Can anyone tell me how to read the Excel file without hidden columns in Python with Pandas or any other modules?

When I try to read excel file using Pandas, for example:

file_np = pd.read_excel(f_name)

the dataframe file_np always contain all the column. From this dataframe, I do not know how to identify which column was hidden in the Excel file. Thank you!

like image 403
tairen Avatar asked Mar 15 '18 06:03

tairen


People also ask

How do you read data from Excel file in Python using pandas?

We can use the pandas module read_excel() function to read the excel file data into a DataFrame object. If you look at an excel sheet, it's a two-dimensional table. The DataFrame object also represents a two-dimensional tabular data structure.

Which of the methods below can read data from Excel in Python pandas?

The Quick Answer: Use Pandas read_excel to Read Excel Files To read Excel files in Python's Pandas, use the read_excel() function. You can specify the path to the file and a sheet name to read, as shown below: What is this?


1 Answers

I don't think pandas does it out of the box.

Input

enter image description here

You will have to unfortunately do some redundant reading (twice). openpyxl does what you want -

import openpyxl
import pandas as pd

loc = 'sample.xlsx'
wb = openpyxl.load_workbook(loc)
ws = wb.get_sheet_by_name('Sheet1')

hidden_cols = []
for colLetter,colDimension in ws.column_dimensions.items():
    if colDimension.hidden == True:
        hidden_cols.append(colLetter)

df = pd.read_excel(loc)
unhidden = list( set(df.columns) - set(hidden_cols) )
df = df[unhidden]
print(df)

Output

    C   A
0   1   7
1   9   7
2   5  10
3   7   7
4   4   8
5   4   6
6   9   9
7  10   3
8   1   2

Explanation

Reading the file first using openpyxl -

loc = 'C:/Users/FGB3140/Desktop/sample.xlsx'
wb = openpyxl.load_workbook(loc)
ws = wb.get_sheet_by_name('Sheet1')

Searching for hidden property in cells (this is where the hidden columns are captured)

hidden_cols = []
for colLetter,colDimension in ws.column_dimensions.items():
    if colDimension.hidden == True:
        hidden_cols.append(colLetter)

Read the same file using pandas - df = pd.read_excel(loc)

Find the unhidden columns by subtracting the hidden ones from the rest -

unhidden = list( set(df.columns) - set(hidden_cols) )

Finally, filter out the unhidden columns -

df = df[unhidden]

P.S

I know I could have done colDimension.hidden == False or simple if not colDimension.hidden - The goal here is to capture the hidden columns and then do the filtering accordingly. Hope this helps!

like image 50
Vivek Kalyanarangan Avatar answered Oct 01 '22 02:10

Vivek Kalyanarangan