I want to read certain column from excel file into dataframe however I want to specify the column with its column header name.
for an example, I have an excel file with two columns in Sheet 2: "number" in column A and "ForeignKey" in column B). I want to import the "ForeignKey" into a dataframe. I did this with the following script:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=[0,1])
It shows the following in my xl_file:
number ForeignKey
0 1 abc
1 2 def
2 3 ghi
in case a small number of column, I can get the "ForeignKey" by specifying usecols=[1]
. However if I have many column and know the column name pattern, it will be easier by specifying the column name. I tried the following code but it gives empty dataframe.
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=['ForeignKey'])
According to discussion in the following link, the code above works well but for read_csv
.
[How to drop a specific column of csv file while reading it using pandas?
Is there a way to do this for reading excel file?
thank you in advance
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
Method 2: Reading an excel file using Python using openpyxl The load_workbook() function opens the Books. xlsx file for reading. This file is passed as an argument to this function. The object of the dataframe.
We can use the pandas module read_excel() function to read the excel file data into a DataFrame object.
Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name.
there is a solution but csv are not treated the same way excel does.
from documentation, for csv:
usecols : list-like or callable, default None
For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’].
for excel:
usecols : int or list, default None
- If None then parse all columns,
- If int then indicates last column to be parsed
- If list of ints then indicates list of column numbers to be parsed
- If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides
so you need to call it like this:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='ForeignKey')
and if you need also 'number'
:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='number,ForeignKey')
EDIT: you need to put the name of the excel column not the name of the data. the other answer solve this. however you won't need 'B:B', 'B' will do the trick BUT that won't improve the usecols with numbers.
if you can load all the datas in not time maybe the best way to solve this is to parse all columns and then select the desired columns:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2')['ForeignKey']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With