Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip specific set of columns when reading excel frame - pandas

I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])

There is nothing related to this in the documentation. is there any workaround for this?

like image 729
Juan David Avatar asked Apr 05 '18 16:04

Juan David


People also ask

How do you skip columns in a data frame?

We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.

How do I skip a row in Excel using pandas?

How do I skip a row in Excel using pandas? Method 1: Skip One Specific Row #import DataFrame and skip 2nd row df = pd. Method 2: Skip Several Specific Rows #import DataFrame and skip 2nd and 4th row df = pd. Method 3: Skip First N Rows #import DataFrame and skip first 2 rows df = pd.

How do I read a specific column in Excel using python?

If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides. If list of int, then indicates list of column numbers to be parsed. If list of string, then indicates list of column names to be parsed.


2 Answers

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

like image 158
MarMat Avatar answered Sep 20 '22 18:09

MarMat


You can use the following technique. Let the columns we don't want(want to skip) are 2 5 8, then find all reamining columns we DO WANT TO KEEP as cols such that:

In [7]: cols2skip = [2,5,8]  
In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then we can use those remaining columns(which we DO WANT TO KEEP) using usecols:

df = pd.read_excel(filename, usecols=cols)
like image 37
MaxU - stop WAR against UA Avatar answered Sep 17 '22 18:09

MaxU - stop WAR against UA