Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame modified inside a function

Tags:

I face to a modification of a dataframe inside a function that I have never observed previously. Is there a method to deal with this and no modify the initial dataframe ?

In[30]: def test(df):     df['tt'] = np.nan     return df  In[31]: dff = pd.DataFrame(data=[])  In[32]: dff  Out[32]:  Empty DataFrame Columns: [] Index: [] In[33]: df = test(dff)  In[34]: dff  Out[34]:  Empty DataFrame Columns: [tt] Index: [] 
like image 245
Alexis G Avatar asked Jul 24 '15 15:07

Alexis G


People also ask

How do you use data frames in a function?

Apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index ( axis=0 ) or the DataFrame's columns ( axis=1 ). By default ( result_type=None ), the final return type is inferred from the return type of the applied function.

Can we modify a data inside a DataFrame?

Although DataFrames are meant to be populated by reading already organized data from external files, many times you will need to somehow manage and modify already existing columns (and rows) in a DF.

Is DataFrame mutable in Python?

In other words, the dataframe is mutable and provides great flexibility to work with. While Pyspark derives its basic data types from Python, its own data structures are limited to RDD, Dataframes, Graphframes.

Which of the following argument is used to make changes permanent in a DataFrame while performing some operations?

When trying to make changes to a Pandas dataframe using a function, we use 'inplace=True' if we want to commit the changes to the dataframe.


1 Answers

def test(df):     df = df.copy(deep=True)     df['tt'] = np.nan     return df 

If you pass the dataframe into a function and manipulate it and return the same dataframe, you are going to get the same dataframe in modified version. If you want to keep your old dataframe and create a new dataframe with your modifications then by definition you have to have 2 dataframes. The one that you pass in that you don't want modified and the new one that is modified. Therefore, if you don't want to change the original dataframe your best bet is to make a copy of the original dataframe. In my example I rebound the variable "df" in the function to the new copied dataframe. I used the copy method and the argument "deep=True" makes a copy of the dataframe and its contents. You can read more here:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html

like image 143
Skorpeo Avatar answered Oct 02 '22 23:10

Skorpeo