Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the point of views in pandas if it is undefined whether an indexing operation returns a view or a copy?

I have switched from R to pandas. I routinely get SettingWithCopyWarnings, when I do something like

df_a = pd.DataFrame({'col1': [1,2,3,4]})    

# Filtering step, which may or may not return a view
df_b = df_a[df_a['col1'] > 1]

# Add a new column to df_b
df_b['new_col'] = 2 * df_b['col1']

# SettingWithCopyWarning!!

I think I understand the problem, though I'll gladly learn what I got wrong. In the given example, it is undefined whether df_b is a view on df_a or not. Thus, the effect of assigning to df_b is unclear: does it affect df_a? The problem can be solved by explicitly making a copy when filtering:

df_a = pd.DataFrame({'col1': [1,2,3,4]})    

# Filtering step, definitely a copy now
df_b = df_a[df_a['col1'] > 1].copy()

# Add a new column to df_b
df_b['new_col'] = 2 * df_b['col1']

# No Warning now

I think there is something that I am missing: if we can never really be sure whether we create a view or not, what are views good for? From the pandas documentation (http://pandas-docs.github.io/pandas-docs-travis/indexing.html?highlight=view#indexing-view-versus-copy)

Outside of simple cases, it’s very hard to predict whether it [getitem] will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees)

Similar warnings can be found for different indexing methods.

I find it very cumbersome and errorprone to sprinkle .copy() calls throughout my code. Am I using the wrong style for manipulating my DataFrames? Or is the performance gain so high that it justifies the apparent awkwardness?

like image 409
sjk Avatar asked Jan 19 '16 18:01

sjk


People also ask

What is view and copy in DataFrame?

Some operations in pandas (and numpy as well) will return views of the original data, while other copies. To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .

What is data indexing in pandas?

Index is like an address, that's how any data point across the dataframe or series can be accessed. Rows and columns both have indexes, rows indices are called as index and for columns its general column names.

How do you ignore a value is trying to be set on a copy of a slice from a DataFrame?

A value is trying to be set on a copy of a slice from a DataFrame. One approach that can be used to suppress SettingWithCopyWarning is to perform the chained operations into just a single loc operation. This will ensure that the assignment happens on the original DataFrame instead of a copy.

How do you stop SettingWithCopyWarning in pandas?

Generally, to avoid a SettingWithCopyWarning in Pandas, you should do the following: Avoid chained assignments that combine two or more indexing operations like df["z"][mask] = 0 and df. loc[mask]["z"] = 0 . Apply single assignments with just one indexing operation like df.


1 Answers

Great question!

The short answer is: this is a flaw in pandas that's being remedied.

You can find a longer discussion of the nature of the problem here, but the main take-away is that we're now moving to a "copy-on-write" behavior in which any time you slice, you get a new copy, and you never have to think about views. The fix will soon come through this refactoring project. I actually tried to fix it directly (see here), but it just wasn't feasible in the current architecture.

In truth, we'll keep views in the background -- they make pandas SUPER memory efficient and fast when they can be provided -- but we'll end up hiding them from users so, from the user perspective, if you slice, index, or cut a DataFrame, what you get back will effectively be a new copy.

(This is accomplished by creating views when the user is only reading data, but whenever an assignment operation is used, the view will be converted to a copy before the assignment takes place.)

Best guess is the fix will be in within a year -- in the mean time, I'm afraid some .copy() may be necessary, sorry!

like image 65
nick_eu Avatar answered Oct 10 '22 15:10

nick_eu