Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Key error when selecting columns in pandas dataframe after read_csv

Tags:

python

pandas

csv

I'm trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.

The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.

I am using this code:

import pandas as pd

transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']

This is the file I'm working on: https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0

Thank you!

like image 826
Harry M Avatar asked Mar 06 '16 19:03

Harry M


People also ask

How do I fix pandas key error?

How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

How can pandas avoid key errors?

We can avoid KeyError by using get() function to access the key value. If the key is missing, None is returned. We can also specify a default value to return when the key is missing.

Why is pandas not recognizing column name?

Typically this error occurs when you simply misspell a column names or include an accidental space before or after the column name.


2 Answers

use sep='\s*,\s*' so that you will take care of spaces in column-names:

transactions = pd.read_csv('transactions.csv', sep=r'\s*,\s*',
                           header=0, encoding='ascii', engine='python')

alternatively you can make sure that you don't have unquoted spaces in your CSV file and use your command (unchanged)

prove:

print(transactions.columns.tolist())

Output:

['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']
like image 198
MaxU - stop WAR against UA Avatar answered Oct 09 '22 17:10

MaxU - stop WAR against UA


if you need to select multiple columns from dataframe use 2 pairs of square brackets eg.

df[["product_id","customer_id","store_id"]]
like image 40
Aswin Babu Avatar answered Oct 09 '22 17:10

Aswin Babu