What is wrong with spark sql substring function?

Tags:

apache-spark-sql

spark-dataframe

This should require no explanation. But could someone describe the logic behind the pos parameter of substring because I cannot make sense of this (Using Spark 2.1):

scala> val df = Seq("abcdef").toDS()
df: org.apache.spark.sql.Dataset[String] = [value: string]

scala> df.show
+------+
| value|
+------+
|abcdef|
+------+

scala> df.selectExpr("substring(value, 0, 2)", "substring(value, 1, 2)", "substring(value, 2,2)", "substring(value, 3,2)").show
+----------------------+----------------------+----------------------+----------------------+
|substring(value, 0, 2)|substring(value, 1, 2)|substring(value, 2, 2)|substring(value, 3, 2)|
+----------------------+----------------------+----------------------+----------------------+
|                    ab|                    ab|                    bc|                    cd|
+----------------------+----------------------+----------------------+----------------------+

934

asked Sep 29 '17 23:09

Jeff Saremi

1 Answers

first value is from what index it should start (starts from 1 not from 0) second value is how many characters it should take from the index

answered Jan 01 '23 22:01

Vasile Surdu

Related questions
                            
                                Calculate Cosine Similarity Spark Dataframe
                            
                                how to implement spark sql pagination query
                            
                                Hive UDF for selecting all except some columns
                            
                                pyspark: TypeError: IntegerType can not accept object in type <type 'unicode'>
                            
                                How does Spark parallelize the processing of a 1TB file?
                            
                                How to retrieve Metrics like Output Size and Records Written from Spark UI?
                            
                                How does computing table stats in hive or impala speed up queries in Spark SQL?
                            
                                Spark: Order of column arguments in repartition vs partitionBy
                            
                                Saving to parquet subpartition
                            
                                Iterating over PySpark GroupedData
                            
                                Retain keys with null values while writing JSON in spark
                            
                                Append a new column to an existing parquet file
                            
                                Why do columns change to nullable in Apache Spark SQL?
                            
                                Extract words from a string column in spark dataframe
                            
                                spark.ml StringIndexer throws 'Unseen label' on fit()
                            
                                Filtering rows based on column values in spark dataframe scala
                            
                                How to calculate Percentile of column in a DataFrame in spark?
                            
                                How to use a broadcast collection in a udf?
                            
                                How to group by common element in array?
                            
                                Spark 2.0: Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With