I'm confused about the results for indexing columns in pandas.
Both
db['varname']
and
db[['varname']]
give me the column value of 'varname'. However it looks like there is some subtle difference, since the output from db['varname']
shows me the type of the value.
The first looks for a specific Key
in your df, a specific column, the second is a list of columns to sub-select from your df so it returns all columns matching the values in the list.
The other subtle thing is that the first by default will return a Series
object whilst the second returns a DataFrame
even if you pass a list containing a single item
Example:
In [2]:
df = pd.DataFrame(columns=['VarName','Another','me too'])
df
Out[2]:
Empty DataFrame
Columns: [VarName, Another, me too]
Index: []
In [3]:
print(type(df['VarName']))
print(type(df[['VarName']]))
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
so when you pass a list then it tries to match all elements:
In [4]:
df[['VarName','Another']]
Out[4]:
Empty DataFrame
Columns: [VarName, Another]
Index: []
but without the additional []
then this will raise a KeyError
:
df['VarName','Another']
KeyError: ('VarName', 'Another')
Because you're then trying to find a column named: 'VarName','Another'
which doesn't exist
For sklearn, it is better to use db[['varname']]
, which has a 2D shape.
for example:
from sklearn.preprocessing import KBinsDiscretizer kbinsDiscretizer
est = KBinsDiscretizer(n_bins=3, encode='onehot-dense', strategy='uniform')
est.fit(db[['varname']]) # where use dfb['varname'] causes error
This is close to a dupe of another, and I got this answer from it at: https://stackoverflow.com/a/45201532/1331446, credit to @SethMMorton.
Answering here as this is the top hit on Google and it took me ages to "get" this.
Pandas has no [[
operator at all.
When you see df[['col_name']]
you're really seeing:
col_names = ['col_name']
df[col_names]
In consequence, the only thing that [[
does for you is that it makes the
result a DataFrame, rather than a Series.
[
on a DataFrame looks at the type of the parameter; it ifs a scalar, then you're only after one column, and it hands it back as a Series; if it's a list, then you must be after a set of columns, so it hands back a DataFrame (with only these columns).
That's it!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With