get datatype of column using pyspark

Tags:

We are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ).

I am trying to get a datatype using pyspark.

My problem is some columns have different datatype.

Assume quantity and weight are the columns

quantity           weight ---------          -------- 12300              656 123566000000       789.6767 1238               56.22 345                23 345566677777789    21

Actually we didn't defined data type for any column of mongo collection.

When I query to the count from pyspark dataframe

dataframe.count()

I got exception like this

"Cannot cast STRING into a DoubleType (value: BsonString{value=&apos;200.0&apos;})"

744

asked Jul 11 '17 11:07

Sreenuvasulu

1 Answers

Your question is broad, thus my answer will also be broad.

To get the data types of your DataFrame columns, you can use dtypes i.e :

>>> df.dtypes [('age', 'int'), ('name', 'string')]

This means your column age is of type int and name is of type string.

137

answered Sep 22 '22 11:09

eliasah

Related questions
                            
                                Joining Spark dataframes on the key
                            
                                Spark 1.4 increase maxResultSize memory
                            
                                How to handle categorical features with spark-ml?
                            
                                Filtering a Pyspark DataFrame with SQL-like IN clause
                            
                                What is a task in Spark? How does the Spark worker execute the jar file?
                            
                                Difference between DataSet API and DataFrame API [duplicate]
                            
                                Application report for application_ (state: ACCEPTED) never ends for Spark Submit (with Spark 1.2.0 on YARN)
                            
                                How to optimize shuffle spill in Apache Spark application
                            
                                What is the Spark DataFrame method `toPandas` actually doing?
                            
                                Spark: what's the best strategy for joining a 2-tuple-key RDD with single-key RDD?
                            
                                Installing of SparkR
                            
                                Flattening Rows in Spark
                            
                                dataframe: how to groupBy/count then filter on count in Scala
                            
                                Spark Window Functions - rangeBetween dates
                            
                                What is the difference between cube, rollup and groupBy operators?
                            
                                Reduce a key-value pair into a key-list pair with Apache Spark
                            
                                How to deal with executor memory and driver memory in Spark?
                            
                                How to reduce the verbosity of Spark's runtime output?
                            
                                Spark iterate HDFS directory
                            
                                Spark unionAll multiple dataframes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

get datatype of column using pyspark

Tags:

apache-spark

apache-spark-sql

pyspark

Sreenuvasulu

People also ask

1 Answers

eliasah

Recent Activity

Donate For Us