I have a pandas DataFrame called df
where df.shape
is (53, 80)
where indexes and columns are both int
.
If I select the first row like this, I get :
df.loc[0].shape
(80,)
instead of :
(1,80)
But then df.loc[0:0].shape
or df[0:1].shape
both show the correct shape.
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
The shape of a DataFrame is a tuple of array dimensions that tells the number of rows and columns of a given DataFrame.
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
shape. This attribute is used to display the total number of rows and columns of a particular data frame. For example, if we have 3 rows and 2 columns in a DataFrame then the shape will be (3,2).
df.loc[0]
returns a one-dimensional pd.Series
object representing the data in a single row, extracted via indexing.
df.loc[0:0]
returns a two-dimensional pd.DataFrame
object representing the data in a dataframe with one row, extracted via slicing.
You can see this more clearly if you print the results of these operations:
import pandas as pd, numpy as np
df = pd.DataFrame(np.arange(9).reshape(3, 3))
res1 = df.loc[0]
res2 = df.loc[0:0]
print(type(res1), res1, sep='\n')
<class 'pandas.core.series.Series'>
0 0
1 1
2 2
Name: 0, dtype: int32
print(type(res2), res2, sep='\n')
<class 'pandas.core.frame.DataFrame'>
0 1 2
0 0 1 2
The convention follows NumPy indexing / slicing. This is natural since Pandas is built on NumPy arrays.
arr = np.arange(9).reshape(3, 3)
print(arr[0].shape) # (3,), i.e. 1-dimensional
print(arr[0:0].shape) # (0, 3), i.e. 2-dimensional
When you call df.iloc[0]
, it is selecting first row and type is Series
whereas, in other case df.iloc[0:0]
it is slicing rows and is of type dataframe
. And Series
are according to pandas Series documentation :
One-dimensional ndarray with axis labels
whereas dataframe
are Two-dimensional (pandas Dataframe documentation).
Try running following lines to see the difference:
print(type(df.iloc[0]))
# <class 'pandas.core.series.Series'>
print(type(df.iloc[0:0]))
# <class 'pandas.core.frame.DataFrame'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With