I have a large dataframe (10m rows, 40 columns, 7GB in memory). I would like to create a view in order to have a shorthand name for a view that is complicated to express, without adding another 2-4 GB to memory usage. In other words, I would rather type:
df2
Than:
df.loc[complicated_condition, some_columns]
The documentation states that, while using .loc
ensures that setting values modifies the original dataframe, there is still no guarantee as to whether the object returned by .loc
is a view or a copy.
I know I could assign the condition and column list to variables (e.g. df.loc[cond, cols]
), but I'm generally curious to know whether it is possible to create a view of a dataframe.
Edit: Related questions:
The simplest and easiest way to display pandas DataFrame in a table style is by using the display() function that imports from the IPython. display module. This function displays the DataFrame in an interactive and well-formatted tabular form.
Pandas Series: view() function The view() function is used to create a new view of the Series. This function returns a new Series with a view of the same underlying values in memory, optionally reinterpreted with a new data type. The new data type must preserve the same size in bytes as to not cause index misalignment.
Views versus copies To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .
You generally can't return a view.
Your answer lies in the pandas docs: returning-a-view-versus-a-copy.
Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.
This answer was found in the following post: Link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With