Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent for R / dplyr's glimpse() function in Python for Panda dataframes?

I find the glimpse function very useful in R/dplyr. But as someone who is used to R and is working with Python now, I haven't found something as useful for Panda dataframes.

In Python, I've tried things like .describe() and .info() and .head() but none of these give me the useful snapshot which R's glimpse() gives us.

Nice features which I'm quite accustomed to having in glimpse() include:

  • All variables/column names as rows in the output
  • All variable/column data types
  • The first few observations of each column
  • Total number of observations
  • Total number of variables/columns

Here is some simple code you could work it with:

R

library(dplyr)

test <- data.frame(column_one = c("A", "B", "C", "D"),
           column_two = c(1:4))

glimpse(test)

# The output is as follows

Rows: 4
Columns: 2
$ column_one <chr> "A", "B", "C", "D"
$ column_two <int> 1, 2, 3, 4

Python

import pandas as pd

test = pd.DataFrame({'column_one':['A', 'B', 'C', 'D'],
                     'column_two':[1, 2, 3, 4]})

Is there a single function for Python which mirrors these capabilities closely (not multiple and not partly)? If not, how would you create a function that does the job precisely?

like image 807
kodikai Avatar asked Apr 25 '26 03:04

kodikai


2 Answers

Here is one way to do it:

def glimpse(df):
    print(f"Rows: {df.shape[0]}")
    print(f"Columns: {df.shape[1]}")
    for col in df.columns:
        print(f"$ {col} <{df[col].dtype}> {df[col].head().values}")

Then:

import pandas as pd

df = pd.DataFrame(
    {"column_one": ["A", "B", "C", "D"], "column_two": [1, 2, 3, 4]}
)

glimpse(df)

# Output
Rows: 4
Columns: 2
$ column_one <object> ['A' 'B' 'C' 'D']
$ column_two <int64> [1 2 3 4]
like image 101
Laurent Avatar answered Apr 27 '26 23:04

Laurent


I prefer this a bit more since it leverages Pandas native methods and looks nice both on Jupyter and in the terminal.

def glimpse(df: pd.DataFrame) -> pd.DataFrame:
    """
    Similar to R's glimpse()

    Parameters
    ----------
    df : pd.DataFrame

    Returns
    -------
    pd.DataFrame
    """
    print(f"Rows: {df.shape[0]}")
    print(f"Columns: {df.shape[1]}")

    sample_size = min(df.shape[0], 5)

    return (
        df.sample(sample_size)
        .T.assign(dtypes=df.dtypes)
        .loc[
            :, lambda x: sorted(x.columns, key=lambda col: 0 if col == "dtypes" else 1)
        ]
    )
df = pd.DataFrame({"column_one": ["A", "B", "C", "D"], "column_two": [1, 2, 3, 4]})

df.pipe(glimpse)
Rows: 4
Columns: 2

            dtypes  2  0  1  3
column_one  object  C  A  B  D
column_two   int64  3  1  2  4

enter image description here

like image 38
Daniel Cárdenas Avatar answered Apr 27 '26 21:04

Daniel Cárdenas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!