We are reading data from MongoDB Collection
. Collection
column has two different values (e.g.: (bson.Int64,int) (int,float)
).
I am trying to get a datatype using pyspark.
My problem is some columns have different datatype.
Assume quantity
and weight
are the columns
quantity weight --------- -------- 12300 656 123566000000 789.6767 1238 56.22 345 23 345566677777789 21
Actually we didn't defined data type for any column of mongo collection.
When I query to the count from pyspark dataframe
dataframe.count()
I got exception like this
"Cannot cast STRING into a DoubleType (value: BsonString{value='200.0'})"
In Spark you can get all DataFrame column names and types (DataType) by using df. dttypes and df. schema where df is an object of DataFrame.
Use Dataframe. dtypes to get Data types of columns in Dataframe. In Python's pandas module Dataframe class provides an attribute to get the data type information of each columns i.e. It returns a series object containing data type information of each column.
Method 2: Using type() function type() command is used to return the type of the given object. Here, dataobject is the rdd or dataframe data.
Your question is broad, thus my answer will also be broad.
To get the data types of your DataFrame
columns, you can use dtypes
i.e :
>>> df.dtypes [('age', 'int'), ('name', 'string')]
This means your column age
is of type int
and name
is of type string
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With