Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any better way to convert Array<int> to Array<String> in pyspark

A very huge DataFrame with schema:

root
 |-- id: string (nullable = true)
 |-- ext: array (nullable = true)
 |    |-- element: integer (containsNull = true)

So far I try to explode data, then collect_list:

select
  id,
  collect_list(cast(item as string))
from default.dual
lateral view explode(ext) t as item
group by
  id

But this way is too expansive.

like image 488
Zhang Tong Avatar asked Jan 05 '18 03:01

Zhang Tong


People also ask

How to convert comma-separated string to array in pyspark Dataframe?

In this article, we will learn how to convert comma-separated string to array in pyspark dataframe. In pyspark SQL, the split () function converts the delimiter separated String to an Array. It is done by splitting the string based on delimiters like spaces, commas, and stack them into an array.

How to split a string into an array in pyspark?

In pyspark SQL, the split () function converts the delimiter separated String to an Array. It is done by splitting the string based on delimiters like spaces, commas, and stack them into an array. This function returns pyspark.sql.Column of type Array. str:- The string to be split.

Is pyspark array syntax similar to Python?

The PySpark array syntax isn’t similar to the list comprehension syntax that’s normally used in Python. This post covers the important PySpark array operations and highlights the pitfalls you should watch out for. Create a DataFrame with an array column.

How to index a numbers array in pyspark Dataframe?

Add a first_number column to the DataFrame that returns the first element in the numbers array. The PySpark array indexing syntax is similar to list indexing in vanilla Python.


1 Answers

You can simply cast the ext column to a string array

df = source.withColumn("ext", source.ext.cast("array<string>"))
df.printSchema()
df.show()
like image 90
Silvio Avatar answered Oct 10 '22 00:10

Silvio