Is there an easy way to check whether two data frames are different copies or views of the same underlying data that doesn't involve manipulations? I'm trying to get a grip on when each is generated, and given how idiosyncratic the rules seem to be, I'd like an easy way to test.
For example, I thought "id(df.values)" would be stable across views, but they don't seem to be:
# Make two data frames that are views of same data. df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'], columns = ['a','b','c','d']) df2 = df.iloc[0:2,:] # Demonstrate they are views: df.iloc[0,0] = 99 df2.iloc[0,0] Out[70]: 99 # Now try and compare the id on values attribute # Different despite being views! id(df.values) Out[71]: 4753564496 id(df2.values) Out[72]: 4753603728 # And we can of course compare df and df2 df is df2 Out[73]: False
Other answers I've looked up that try to give rules, but don't seem consistent, and also don't answer this question of how to test:
What rules does Pandas use to generate a view vs a copy?
Pandas: Subindexing dataframes: Copies vs views
Understanding pandas dataframe indexing
Re-assignment in Pandas: Copy or view?
And of course: - http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
UPDATE: Comments below seem to answer the question -- looking at the df.values.base
attribute rather than df.values
attribute does it, as does a reference to the df._is_copy
attribute (though the latter is probably very bad form since it's an internal).
To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .
DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.
Answers from HYRY and Marius in comments!
One can check either by:
testing equivalence of the values.base
attribute rather than the values
attribute, as in:
df.values.base is df2.values.base
instead of df.values is df2.values
.
or using the (admittedly internal) _is_view
attribute (df2._is_view
is True
).
Thanks everyone!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With