Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas, what's the difference between df['column'] and df.column?

Tags:

python

pandas

I'm working my way through Pandas for Data Analysis and learning a ton. However, one thing keeps coming up. The book typically refers to columns of a dataframe as df['column'] however, sometimes without explanation the book uses df.column.

I don't understand the difference between the two. Any help would be appreciated.

Below is come code demonstrating the what I am talking about:

In [5]:

import pandas as pd

data = {'column1': ['a', 'a', 'a', 'b', 'c'], 
        'column2': [1, 4, 2, 5, 3]}
df = pd.DataFrame(data, columns = ['column1', 'column2'])
df

Out[5]:
column1 column2
0    a   1
1    a   4
2    a   2
3    b   5
4    c   3
5 rows × 2 columns

df.column:

In [8]:

df.column1
Out[8]:
0    a
1    a
2    a
3    b
4    c
Name: column1, dtype: object

df['column']:

In [9]:

df['column1']
Out[9]:
0    a
1    a
2    a
3    b
4    c
Name: column1, dtype: object
like image 258
Anton Avatar asked May 08 '14 15:05

Anton


People also ask

What is the difference between DF and DF?

When you write df["] you are basically accessing a set of number values, but when you use df[["]] you are getting a DataFrame object which is compatible with your code. Show activity on this post.

What is DF columns in pandas?

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas.

How do you find the difference between two columns in pandas?

We created a dictionary, and the values for each column are given. Then it is converted into a pandas dataframe. By using the Where() method in NumPy, we are given the condition to compare the columns.

How do you differentiate a column in Python?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter. If the axis parameter is set to axes='columns' , the method finds the difference column by column instead of row by row.


1 Answers

for setting, values, you need to use df['column'] = series.

once this is done however, you can refer to that column in the future with df.column, assuming it's a valid python name. (so df.column works, but df.6column would still have to be accessed with df['6column'])

i think the subtle difference here is that when you set something with df['column'] = ser, pandas goes ahead and adds it to the columns/does some other stuff (i believe by overriding the functionality in __setitem__. if you do df.column = ser, it's just like adding a new field to any existing object which uses __setattr__, and pandas does not seem to override this behavior.

like image 149
acushner Avatar answered Sep 21 '22 03:09

acushner