I'm trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.
The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.
I am using this code:
import pandas as pd
transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']
This is the file I'm working on: https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0
Thank you!
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
We can avoid KeyError by using get() function to access the key value. If the key is missing, None is returned. We can also specify a default value to return when the key is missing.
Typically this error occurs when you simply misspell a column names or include an accidental space before or after the column name.
use sep='\s*,\s*'
so that you will take care of spaces in column-names:
transactions = pd.read_csv('transactions.csv', sep=r'\s*,\s*',
header=0, encoding='ascii', engine='python')
alternatively you can make sure that you don't have unquoted spaces in your CSV file and use your command (unchanged)
prove:
print(transactions.columns.tolist())
Output:
['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']
if you need to select multiple columns from dataframe use 2 pairs of square brackets eg.
df[["product_id","customer_id","store_id"]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With