Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between [] and [[]] in pandas?

Tags:

python

pandas

I'm confused about the results for indexing columns in pandas.

Both

db['varname']

and

db[['varname']]

give me the column value of 'varname'. However it looks like there is some subtle difference, since the output from db['varname'] shows me the type of the value.

like image 840
elong Avatar asked Nov 19 '15 19:11

elong


3 Answers

The first looks for a specific Key in your df, a specific column, the second is a list of columns to sub-select from your df so it returns all columns matching the values in the list.

The other subtle thing is that the first by default will return a Series object whilst the second returns a DataFrame even if you pass a list containing a single item

Example:

In [2]:
df = pd.DataFrame(columns=['VarName','Another','me too'])
df

Out[2]:
Empty DataFrame
Columns: [VarName, Another, me too]
Index: []

In [3]:    
print(type(df['VarName']))
print(type(df[['VarName']]))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

so when you pass a list then it tries to match all elements:

In [4]:
df[['VarName','Another']]

Out[4]:
Empty DataFrame
Columns: [VarName, Another]
Index: []

but without the additional [] then this will raise a KeyError:

df['VarName','Another']

KeyError: ('VarName', 'Another')

Because you're then trying to find a column named: 'VarName','Another' which doesn't exist

like image 102
EdChum Avatar answered Oct 24 '22 09:10

EdChum


For sklearn, it is better to use db[['varname']], which has a 2D shape.

for example:

from sklearn.preprocessing import  KBinsDiscretizer kbinsDiscretizer  

est = KBinsDiscretizer(n_bins=3, encode='onehot-dense', strategy='uniform') 
est.fit(db[['varname']]) # where use dfb['varname'] causes error
like image 41
Alice Smith Avatar answered Oct 24 '22 07:10

Alice Smith


This is close to a dupe of another, and I got this answer from it at: https://stackoverflow.com/a/45201532/1331446, credit to @SethMMorton.

Answering here as this is the top hit on Google and it took me ages to "get" this.

Pandas has no [[ operator at all.

When you see df[['col_name']] you're really seeing:

col_names = ['col_name']
df[col_names]

In consequence, the only thing that [[ does for you is that it makes the result a DataFrame, rather than a Series.

[ on a DataFrame looks at the type of the parameter; it ifs a scalar, then you're only after one column, and it hands it back as a Series; if it's a list, then you must be after a set of columns, so it hands back a DataFrame (with only these columns).

That's it!

like image 37
dsz Avatar answered Oct 24 '22 07:10

dsz