Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get max length of string column from dataframe using scala?

This can be a really simple question. I am using Spark 1.6 with scala

var DF=hivecontext.sql("select name from myTable")
val name_max_len =DF.agg(max(length($"name"))) // did not work

println(name_max_len)

How can I get max length?

like image 656
Jhon Avatar asked Dec 21 '16 19:12

Jhon


People also ask

How do I find the length of a string in a spark data frame?

char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

How do I get the length of a column in spark DataFrame?

Spark SQL provides a length() function that takes the DataFrame column type as a parameter and returns the number of characters (including trailing spaces) in a string. This function can be used to filter() the DataFrame rows by the length of a column. If the input column is Binary, it returns the number of bytes.

What is long type in spark?

LongType : Represents 8-byte signed integer numbers. The range of numbers is from -9223372036854775808 to 9223372036854775807 . FloatType : Represents 4-byte single-precision floating point numbers. DoubleType : Represents 8-byte double-precision floating point numbers.


1 Answers

You should collect result:

import org.apache.spark.sql.functions.max

val df = Seq("foo", "bar", "foobar").toDF("name")
df.agg(max(length($"name"))).as[Int].first
// res0: Int = 6
like image 162
user7327360 Avatar answered Nov 14 '22 11:11

user7327360