Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check Type: How to check if something is a RDD or a DataFrame?

I'm using Python, and this is a Spark RDD / DataFrame.

I tried isinstance(thing, RDD) but RDD wasn't recognized.

The reason I need to do this:

I'm writing a function where both RDD and DataFrame could be passed in, so I'll need to do input.rdd to get the underlying RDD if a DataFrame is passed in.

like image 420
Jobs Avatar asked Apr 19 '16 23:04

Jobs


1 Answers

isinstance will work just fine:

from pyspark.sql import DataFrame
from pyspark.rdd import RDD

def foo(x):
    if isinstance(x, RDD):
        return "RDD"
    if isinstance(x, DataFrame):
        return "DataFrame"

foo(sc.parallelize([]))
## 'RDD'
foo(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

but single dispatch is much more elegant approach:

from functools import singledispatch

@singledispatch
def bar(x):
    pass 

@bar.register(RDD)
def _(arg):
    return "RDD"

@bar.register(DataFrame)
def _(arg):
    return "DataFrame"

bar(sc.parallelize([]))
## 'RDD'

bar(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

If you don't mind additional dependencies multipledispatch is also an interesting option:

from multipledispatch import dispatch

@dispatch(RDD)
def baz(x):
    return "RDD"

@dispatch(DataFrame)
def baz(x):
    return "DataFrame"

baz(sc.parallelize([]))
## 'RDD'

baz(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

Finally the most Pythonic approach is to simply check an interface:

def foobar(x):
    if hasattr(x, "rdd"):
        ## It is a DataFrame
    else:
        ## It (probably) is a RDD
like image 157
zero323 Avatar answered Sep 19 '22 20:09

zero323