Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behavior in assigning 2d numpy array to pandas DataFrame

I have the following code:

x = pd.DataFrame(np.zeros((4, 1)), columns=['A'])
y = np.random.randn(4, 2)
x['A'] = y

I expect it to throw an exception because of shape mismatch. But pandas silently accepted the assignment: y's first column is assigned to x.

Is this an intentional design? If yes, what is the rationale behind?

I tried both pandas 0.21 and 0.23.


Thanks for those who tried to help. However, nobody gives a satisfactory answer although the bounty is going to expire.

Let me emphasis what is expected as an answer:

  1. whether this design is intentional? Is it a bug ? Is it a false design?
  2. what is the rationale to design it in this way?

Since the bounty is going to expiry, I accepted the most voted answer. But it does not provide a answer to the above questions.

like image 487
doraemon Avatar asked Sep 03 '18 02:09

doraemon


People also ask

How do you make a 2D numpy array from a DataFrame?

It is quite easy to transform a pandas dataframe into a numpy array. Simply using the to_numpy() function provided by Pandas will do the trick. This will return us a numpy 2D array of the same size as our dataframe (df), but with the column names discarded.

How do you declare a 2D Numpy array in Python?

In Python to declare a new 2-dimensional array we can easily use the combination of arange and reshape() method. The reshape() method is used to shape a numpy array without updating its data and arange() function is used to create a new array.

How is a pandas DataFrame different from a 2D numpy array?

Numpy arrays can be multi-dimensional whereas DataFrame can only be two-dimensional. Arrays contain similar types of objects or elements whereas DataFrame can have objects or multiple or similar data types. Both array and DataFrames are mutable.

How do I convert a Numpy array into a DataFrame?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.


1 Answers

The values in y are un-indexed matrix. The case x['A'] = y works here as it take the first item from the matrix and assign it to the 'A'.

Similarly,

x = pd.DataFrame(np.zeros((4, 2)), columns=['A', 'B'])
y = np.random.randn(4, 2)
x[['A', 'B']] = y

will also work because the extra data is being discarded by pandas. If you're trying to pass less columns, say:

x = pd.DataFrame(np.zeros((4, 2)), columns=['A', 'B'])
y = np.random.randn(4, 1)
x[['A', 'B']] = y

That will also work as it will assign the same values to both the columns. This case is similar to x['A'] = 0 which will replace all the data in column A with zeros.

like image 180
yogkm Avatar answered Sep 25 '22 06:09

yogkm