Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame.assign arguments

Tags:

How can assign be used to return a copy of the original DataFrame with multiple new columns added?

Desired result:

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)}) >>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})    A   B   C   D 0  1  11   1  22 1  2  12   4  24 2  3  13   9  26 3  4  14  16  28 

The example above results in:

ValueError: Wrong number of items passed 2, placement implies 1.

Background:

The assign function in Pandas takes a copy of the relevant dataframe joined to the newly assigned column, e.g.

df = df.assign(C=df.B * 2) >>> df    A   B   C 0  1  11  22 1  2  12  24 2  3  13  26 3  4  14  28 

The 0.19.2 documentation for this function implies that more than one column can be added to the dataframe.

Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.

In addition:

Parameters:
kwargs : keyword, value pairs

keywords are the column names.

The source code for the function states that it accepts a dictionary:

def assign(self, **kwargs):     """     .. versionadded:: 0.16.0     Parameters     ----------     kwargs : keyword, value pairs         keywords are the column names. If the values are callable, they are computed          on the DataFrame and assigned to the new columns. If the values are not callable,          (e.g. a Series, scalar, or array), they are simply assigned.      Notes     -----     Since ``kwargs`` is a dictionary, the order of your     arguments may not be preserved. The make things predicatable,     the columns are inserted in alphabetical order, at the end of     your DataFrame. Assigning multiple columns within the same     ``assign`` is possible, but you cannot reference other columns     created within the same ``assign`` call.     """      data = self.copy()      # do all calculations first...     results = {}     for k, v in kwargs.items():          if callable(v):             results[k] = v(data)         else:             results[k] = v      # ... and then assign     for k, v in sorted(results.items()):         data[k] = v      return data 
like image 400
Alexander Avatar asked Feb 07 '17 22:02

Alexander


People also ask

How do I assign a value to a Pandas DataFrame?

You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.

How do I assign a function in Pandas?

assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones. Existing columns that are re-assigned will be overwritten. Length of newly assigned column must match the number of rows in the dataframe.


1 Answers

You can create multiple column by supplying each new column as a keyword argument:

df = df.assign(C=df['A']**2, D=df.B*2) 

I got your example dictionary to work by unpacking the dictionary as keyword arguments using **:

df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2}) 

It seems like assign should be able to take a dictionary, but it doesn't look to be currently supported based on the source code you posted.

The resulting output:

   A   B   C   D 0  1  11   1  22 1  2  12   4  24 2  3  13   9  26 3  4  14  16  28 
like image 108
root Avatar answered Sep 25 '22 03:09

root