Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get datatype of column using pyspark

We are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ).

I am trying to get a datatype using pyspark.

My problem is some columns have different datatype.

Assume quantity and weight are the columns

quantity           weight ---------          -------- 12300              656 123566000000       789.6767 1238               56.22 345                23 345566677777789    21 

Actually we didn't defined data type for any column of mongo collection.

When I query to the count from pyspark dataframe

dataframe.count() 

I got exception like this

"Cannot cast STRING into a DoubleType (value: BsonString{value='200.0'})" 
like image 744
Sreenuvasulu Avatar asked Jul 11 '17 11:07

Sreenuvasulu


People also ask

How can I check the DataType of a column in Spark data frame?

In Spark you can get all DataFrame column names and types (DataType) by using df. dttypes and df. schema where df is an object of DataFrame.

How do you find the data type of a column in Python?

Use Dataframe. dtypes to get Data types of columns in Dataframe. In Python's pandas module Dataframe class provides an attribute to get the data type information of each columns i.e. It returns a series object containing data type information of each column.

How do you check the type of an object in PySpark?

Method 2: Using type() function type() command is used to return the type of the given object. Here, dataobject is the rdd or dataframe data.


1 Answers

Your question is broad, thus my answer will also be broad.

To get the data types of your DataFrame columns, you can use dtypes i.e :

>>> df.dtypes [('age', 'int'), ('name', 'string')] 

This means your column age is of type int and name is of type string.

like image 137
eliasah Avatar answered Sep 22 '22 11:09

eliasah