Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modify pandas dataframe values with numpy array

I'm trying to modify the values field of a pandas data frame with a numpy array [same size]. something like this does not work

import pandas as pd
# create 2d numpy array, called arr
df = pd.DataFrame(arr, columns=some_list_of_names)
df.values = myfunction(arr)

any alternatives?

like image 435
Bobo Avatar asked Feb 06 '15 22:02

Bobo


People also ask

How do I change the value of a pandas DataFrame?

Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])

What happens when you convert a DataFrame to NumPy array?

Convert the DataFrame to a NumPy array. By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32 , the results dtype will be float32 . This may require copying data and coercing values, which may be expensive.

Can you use NumPy on pandas DataFrame?

Pandas expands on NumPy by providing easy to use methods for data analysis to operate on the DataFrame and Series classes, which are built on NumPy's powerful ndarray class.


2 Answers

The .values attribute is often a copy - especially for mixed dtypes (so assignment to it is not guaranteed to work - in newer versions of pandas this will raise).

You should assign to the specific columns (note the order is important).

df = pd.DataFrame(arr, columns=some_list_of_names)
df[some_list_of_names] = myfunction(arr)

Example (in pandas 0.15.2):

In [11]: df = pd.DataFrame([[1, 2.], [3, 4.]], columns=['a', 'b'])

In [12]: df.values = [[5, 6], [7, 8]]
AttributeError: can't set attribute

In [13]: df[['a', 'b']] = [[5, 6], [7, 8]]

In [14]: df
Out[14]:
   a  b
0  5  6
1  7  8

In [15]: df[['b', 'a']] = [[5, 6], [7, 8]]

In [16]: df
Out[16]:
   a  b
0  6  5
1  8  7
like image 106
Andy Hayden Avatar answered Sep 28 '22 07:09

Andy Hayden


I think this is the method you are looking for:

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.applymap.html

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

Example:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.rand(3,4), columns = list('abcd'))
>>> df
          a         b         c         d
0  0.394819  0.662614  0.752139  0.396745
1  0.802134  0.934494  0.652150  0.698127
2  0.518531  0.582429  0.189880  0.168490
>>> f = lambda x: x*100
>>> df.applymap(f)
           a          b          c          d
0  39.481905  66.261374  75.213857  39.674529
1  80.213437  93.449447  65.215018  69.812667
2  51.853097  58.242895  18.988020  16.849014
>>>
like image 36
Dr. Jan-Philip Gehrcke Avatar answered Sep 28 '22 07:09

Dr. Jan-Philip Gehrcke