Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking whether data frame is copy or view in Pandas

Is there an easy way to check whether two data frames are different copies or views of the same underlying data that doesn't involve manipulations? I'm trying to get a grip on when each is generated, and given how idiosyncratic the rules seem to be, I'd like an easy way to test.

For example, I thought "id(df.values)" would be stable across views, but they don't seem to be:

# Make two data frames that are views of same data. df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'],         columns = ['a','b','c','d']) df2 = df.iloc[0:2,:]  # Demonstrate they are views: df.iloc[0,0] = 99 df2.iloc[0,0] Out[70]: 99  # Now try and compare the id on values attribute # Different despite being views!   id(df.values) Out[71]: 4753564496  id(df2.values) Out[72]: 4753603728  # And we can of course compare df and df2 df is df2 Out[73]: False 

Other answers I've looked up that try to give rules, but don't seem consistent, and also don't answer this question of how to test:

  • What rules does Pandas use to generate a view vs a copy?

  • Pandas: Subindexing dataframes: Copies vs views

  • Understanding pandas dataframe indexing

  • Re-assignment in Pandas: Copy or view?

And of course: - http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

UPDATE: Comments below seem to answer the question -- looking at the df.values.base attribute rather than df.values attribute does it, as does a reference to the df._is_copy attribute (though the latter is probably very bad form since it's an internal).

like image 980
nick_eu Avatar asked Nov 12 '14 04:11

nick_eu


People also ask

Is DataFrame a view?

To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .

How do I know if my DataFrame is identical?

DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do I know my data frame type?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.

What does DataFrame copy () do?

Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.


1 Answers

Answers from HYRY and Marius in comments!

One can check either by:

  • testing equivalence of the values.base attribute rather than the values attribute, as in:

    df.values.base is df2.values.base instead of df.values is df2.values.

  • or using the (admittedly internal) _is_view attribute (df2._is_view is True).

Thanks everyone!

like image 197
nick_eu Avatar answered Oct 19 '22 14:10

nick_eu