Logo Questions Linux Laravel Mysql Ubuntu Git Menu

PySpark: How to judge column type of dataframe

Suppose we have a dataframe called df. I know there is way of using df.dtypes. However I prefer something similar to

type(123) == int # note here the int is not a string

I wonder is there is something like:

type(df.select(<column_name>).collect()[0][1]) == IntegerType

Basically I want to know the way to directly get the object of the class like IntegerType, StringType from the dataframe and then judge it.


like image 873
kww Avatar asked Jan 25 '18 19:01


2 Answers

TL;DR Use external data types (plain Python types) to test values, internal data types (DataType subclasses) to test schema.

First and foremost - You should never use

type(123) == int

Correct way to check types in Python, which handles inheritance, is

isinstance(123, int)

Having this done, lets talk about

Basically I want to know the way to directly get the object of the class like IntegerType, StringType from the dataframe and then judge it.

This is not how it works. DataTypes describe schema (internal representation) not values. External types, is a plain Python object, so if internal type is IntegerType, then external types is int and so on, according to the rules defined in the Spark SQL Programming guide.

The only place where IntegerType (or other DataTypes) instance exist is your schema:

from pyspark.sql.types import *

df = spark.createDataFrame([(1, "foo")])

isinstance(df.schema["_1"].dataType, LongType)
# True
isinstance(df.schema["_2"].dataType, StringType)
# True

_1, _2 = df.first()

isinstance(_1, int)
# True
isinstance(_2, str)
# True
like image 111
Alper t. Turker Avatar answered Sep 24 '22 08:09

Alper t. Turker

What about trying:


This will return something like:

 |-- id: integer (nullable = true)
 |-- col1: string (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: integer (nullable = true)
 |-- col4: date (nullable = true)
 |-- col5: long (nullable = true)
like image 21
Aurelie Giraud Avatar answered Sep 24 '22 08:09

Aurelie Giraud