Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read certain column in excel to dataframe

I want to read certain column from excel file into dataframe however I want to specify the column with its column header name.

for an example, I have an excel file with two columns in Sheet 2: "number" in column A and "ForeignKey" in column B). I want to import the "ForeignKey" into a dataframe. I did this with the following script:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=[0,1]) 

It shows the following in my xl_file:

       number ForeignKey
0       1        abc
1       2        def
2       3        ghi

in case a small number of column, I can get the "ForeignKey" by specifying usecols=[1]. However if I have many column and know the column name pattern, it will be easier by specifying the column name. I tried the following code but it gives empty dataframe.

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=['ForeignKey']) 

According to discussion in the following link, the code above works well but for read_csv.

[How to drop a specific column of csv file while reading it using pandas?

Is there a way to do this for reading excel file?

thank you in advance

like image 645
Fadri Avatar asked Jan 09 '19 09:01

Fadri


People also ask

How do I read a specific column in a DataFrame?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.

How extract specific data from excel to python?

Method 2: Reading an excel file using Python using openpyxl The load_workbook() function opens the Books. xlsx file for reading. This file is passed as an argument to this function. The object of the dataframe.

How do I read a column in python excel?

We can use the pandas module read_excel() function to read the excel file data into a DataFrame object.

How do I read a specific row in excel using pandas?

Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name.


1 Answers

there is a solution but csv are not treated the same way excel does.

from documentation, for csv:

usecols : list-like or callable, default None

For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’].

for excel:

usecols : int or list, default None

  • If None then parse all columns,
  • If int then indicates last column to be parsed
  • If list of ints then indicates list of column numbers to be parsed
  • If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides

so you need to call it like this:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='ForeignKey')

and if you need also 'number':

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='number,ForeignKey')

EDIT: you need to put the name of the excel column not the name of the data. the other answer solve this. however you won't need 'B:B', 'B' will do the trick BUT that won't improve the usecols with numbers.

if you can load all the datas in not time maybe the best way to solve this is to parse all columns and then select the desired columns:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2')['ForeignKey']
like image 79
Frayal Avatar answered Oct 01 '22 16:10

Frayal