Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AssertionError: col should be Column

How to create a new column in PySpark and fill this column with the date of today?

This is what I tried:

import datetime now = datetime.datetime.now() df = df.withColumn("date", str(now)[:10]) 

I get this error:

AssertionError: col should be Column

like image 797
Markus Avatar asked Dec 20 '17 10:12

Markus


People also ask

How do you add a constant value to a column in PySpark?

Add New Column with Constant Value In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .

How do I rename a column in PySpark?

Method 1: Using withColumnRenamed() We will use of withColumnRenamed() method to change the column names of pyspark data frame. existingstr: Existing column name of data frame to rename. newstr: New column name. Returns type: Returns a data frame by renaming an existing column.

What is lit in PySpark?

PySpark lit() function is used to add constant or literal value as a new column to the DataFrame. Creates a [[Column]] of literal value. The passed in object is returned directly if it is already a [[Column]].


1 Answers

How to create a new column in PySpark and fill this column with the date of today?

There is already function for that:

from pyspark.sql.functions import current_date  df.withColumn("date", current_date().cast("string")) 

AssertionError: col should be Column

Use literal

from pyspark.sql.functions import lit  df.withColumn("date", lit(str(now)[:10])) 
like image 187
Alper t. Turker Avatar answered Sep 21 '22 19:09

Alper t. Turker