Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the shape of the selection of my pandas dataframe is wrong

I have a pandas DataFrame called df where df.shape is (53, 80) where indexes and columns are both int.

If I select the first row like this, I get :

df.loc[0].shape
(80,)

instead of :

(1,80)

But then df.loc[0:0].shape or df[0:1].shape both show the correct shape.

like image 559
SebMa Avatar asked Jul 09 '18 16:07

SebMa


People also ask

How do you reshape a pandas DataFrame?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.

What does the shape of your DataFrame tell us?

The shape of a DataFrame is a tuple of array dimensions that tells the number of rows and columns of a given DataFrame.

How do I fix pandas key error?

How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

Which attribute of DataFrame is used to describe its shape?

shape. This attribute is used to display the total number of rows and columns of a particular data frame. For example, if we have 3 rows and 2 columns in a DataFrame then the shape will be (3,2).


2 Answers

df.loc[0] returns a one-dimensional pd.Series object representing the data in a single row, extracted via indexing.

df.loc[0:0] returns a two-dimensional pd.DataFrame object representing the data in a dataframe with one row, extracted via slicing.

You can see this more clearly if you print the results of these operations:

import pandas as pd, numpy as np

df = pd.DataFrame(np.arange(9).reshape(3, 3))

res1 = df.loc[0]
res2 = df.loc[0:0]

print(type(res1), res1, sep='\n')

<class 'pandas.core.series.Series'>
0    0
1    1
2    2
Name: 0, dtype: int32

print(type(res2), res2, sep='\n')

<class 'pandas.core.frame.DataFrame'>
   0  1  2
0  0  1  2

The convention follows NumPy indexing / slicing. This is natural since Pandas is built on NumPy arrays.

arr = np.arange(9).reshape(3, 3)

print(arr[0].shape)    # (3,), i.e. 1-dimensional
print(arr[0:0].shape)  # (0, 3), i.e. 2-dimensional
like image 112
jpp Avatar answered Nov 14 '22 23:11

jpp


When you call df.iloc[0], it is selecting first row and type is Series whereas, in other case df.iloc[0:0] it is slicing rows and is of type dataframe. And Series are according to pandas Series documentation :

One-dimensional ndarray with axis labels

whereas dataframe are Two-dimensional (pandas Dataframe documentation).

Try running following lines to see the difference:

print(type(df.iloc[0]))
# <class 'pandas.core.series.Series'>

print(type(df.iloc[0:0]))
# <class 'pandas.core.frame.DataFrame'>
like image 39
student Avatar answered Nov 15 '22 00:11

student