I am trying to split a dataframe in pyspark This is the data i have <pre class="prettyprint"><code>df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value']) df = df.withColumn('Splitted', split(df['Value'], '|')[0]) </code></pre> I got <pre class="prettyprint"><code>+-----+---------+-----+ |Key|Value|Splitted | +-----+---------+-----+ | 1| Food|10| F| | 2| Bar|11 | B| | 3| Caring 12| C| +-----+---------+-----+ </code></pre> But i want <pre class="prettyprint"><code>+-----+---------+-----+ |Key | Value|Splitted| +-----+---------+-----+ | 1| 10| Food | | 2| 11| Bar | | 3| 12|Caring | +-----+---------+-----+ </code></pre> Can any one please point me to what i am doing wrong? <pre class="prettyprint"><code>What if i have a unique situation like this? df = sc.parallelize([[1, 'Foo|10|we'], [2, 'Bar|11|we'], [3,'Car|12|we']]).toDF(['Key', 'Value']) +---+---------+ |Key| Value| +---+---------+ | 1|Foo|10|we| | 2|Bar|11|we| | 3|Car|12|we| +---+---------+ </code></pre>

You forgot the <code>escape</code> character, you should include escape character as <pre class="prettyprint"><code>df = df.withColumn('Splitted', split(df['Value'], '\|')[0]) </code></pre> If you want output as <pre class="prettyprint"><code>+---+-----+--------+ |Key|Value|Splitted| +---+-----+--------+ |1 |10 |Foo | |2 |11 |Bar | |3 |12 |Car | +---+-----+--------+ </code></pre> You should do <pre class="prettyprint"><code>from pyspark.sql import functions as F df = df.withColumn('Splitted', F.split(df['Value'], '\|')).withColumn('Value', F.col('Splitted')[1]).withColumn('Splitted', F.col('Splitted')[0]) </code></pre>

Splitting a column in pyspark

I am trying to split a dataframe in pyspark This is the data i have

df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value'])
df = df.withColumn('Splitted', split(df['Value'], '|')[0])

I got

+-----+---------+-----+
|Key|Value|Splitted   |
+-----+---------+-----+
|    1|   Food|10|   F|
|    2|   Bar|11 |   B|
|    3|   Caring 12| C|
+-----+---------+-----+

But i want

+-----+---------+-----+
|Key  | Value|Splitted|
+-----+---------+-----+
|    1|   10|  Food   |
|    2|   11|  Bar    |
|    3|   12|Caring   |
+-----+---------+-----+

Can any one please point me to what i am doing wrong?

What if i have a unique situation like this?
df = sc.parallelize([[1, 'Foo|10|we'], [2, 'Bar|11|we'], [3,'Car|12|we']]).toDF(['Key', 'Value'])

+---+---------+
|Key|    Value|
+---+---------+
|  1|Foo|10|we|
|  2|Bar|11|we|
|  3|Car|12|we|
+---+---------+

How do you split a column in Pyspark?

The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc. and converting it into ArrayType.

How do I split a column in a DataFrame spark?

Spark SQL provides split() function to convert delimiter separated String to array (StringType to ArrayType) column on Dataframe. This can be done by splitting a string column based on a delimiter like space, comma, pipe e.t.c, and converting into ArrayType.

How do you split rows in Pyspark?

To split multiple array column data into rows pyspark provides a function called explode(). Using explode, we will get a new row for each element in the array.

You forgot the escape character, you should include escape character as

df = df.withColumn('Splitted', split(df['Value'], '\|')[0])

If you want output as

+---+-----+--------+
|Key|Value|Splitted|
+---+-----+--------+
|1  |10   |Foo     |
|2  |11   |Bar     |
|3  |12   |Car     |
+---+-----+--------+

You should do

from pyspark.sql import functions as F
df = df.withColumn('Splitted', F.split(df['Value'], '\|')).withColumn('Value', F.col('Splitted')[1]).withColumn('Splitted', F.col('Splitted')[0])

Splitting a column in pyspark

Tags:

python

apache-spark

pyspark

PythonRookie

People also ask

1 Answers

Ramesh Maharjan

Recent Activity

Donate For Us

Splitting a column in pyspark

Tags:

python

apache-spark

pyspark

PythonRookie

People also ask

1 Answers

Ramesh Maharjan

Related questions

Recent Activity

Donate For Us