I'm trying to modify the values field of a pandas data frame with a numpy array [same size]. something like this does not work
import pandas as pd
# create 2d numpy array, called arr
df = pd.DataFrame(arr, columns=some_list_of_names)
df.values = myfunction(arr)
any alternatives?
Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])
Convert the DataFrame to a NumPy array. By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32 , the results dtype will be float32 . This may require copying data and coercing values, which may be expensive.
Pandas expands on NumPy by providing easy to use methods for data analysis to operate on the DataFrame and Series classes, which are built on NumPy's powerful ndarray class.
The .values
attribute is often a copy - especially for mixed dtypes (so assignment to it is not guaranteed to work - in newer versions of pandas this will raise).
You should assign to the specific columns (note the order is important).
df = pd.DataFrame(arr, columns=some_list_of_names)
df[some_list_of_names] = myfunction(arr)
Example (in pandas 0.15.2):
In [11]: df = pd.DataFrame([[1, 2.], [3, 4.]], columns=['a', 'b'])
In [12]: df.values = [[5, 6], [7, 8]]
AttributeError: can't set attribute
In [13]: df[['a', 'b']] = [[5, 6], [7, 8]]
In [14]: df
Out[14]:
a b
0 5 6
1 7 8
In [15]: df[['b', 'a']] = [[5, 6], [7, 8]]
In [16]: df
Out[16]:
a b
0 6 5
1 8 7
I think this is the method you are looking for:
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.applymap.html
Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
Example:
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.rand(3,4), columns = list('abcd'))
>>> df
a b c d
0 0.394819 0.662614 0.752139 0.396745
1 0.802134 0.934494 0.652150 0.698127
2 0.518531 0.582429 0.189880 0.168490
>>> f = lambda x: x*100
>>> df.applymap(f)
a b c d
0 39.481905 66.261374 75.213857 39.674529
1 80.213437 93.449447 65.215018 69.812667
2 51.853097 58.242895 18.988020 16.849014
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With