How to convert date to the first day of month in a PySpark Dataframe column?

Tags:

I have the following DataFrame:

+----------+
|      date|
+----------+
|2017-01-25|
|2017-01-21|
|2017-01-12|
+----------+

Here is the code the create above DataFrame:

Click to copy

import pyspark.sql.functions as f
rdd = sc.parallelize([("2017/11/25",), ("2017/12/21",), ("2017/09/12",)])
df = sqlContext.createDataFrame(rdd, ["date"]).withColumn("date", f.to_date(f.col("date"), "yyyy/MM/dd"))
df.show()

I want a new column with the first date of month for each row, just replace the day to "01" in all the dates

Click to copy

+----------++----------+
|      date| first_date|
+----------++----------+
|2017-11-25| 2017-11-01|
|2017-12-21| 2017-12-01|
|2017-09-12| 2017-09-01|
+----------+-----------+

There is a last_day function in PySpark.sql.function, however, there is no first_day function.

I tried using date_sub to do this but did not work: I get a column not Iterable error because the second argument to date_sub cannot be a column and has to be an integer.

Click to copy

f.date_sub(f.col('date'), f.dayofmonth(f.col('date')) - 1 )

424

asked Jan 19 '18 20:01

Rakesh Adhikesavan

1 Answers

You can use trunc:

Click to copy

import pyspark.sql.functions as f

df.withColumn("first_date", f.trunc("date", "month")).show()

+----------+----------+
|      date|first_date|
+----------+----------+
|2017-11-25|2017-11-01|
|2017-12-21|2017-12-01|
|2017-09-12|2017-09-01|
+----------+----------+

119

answered Oct 21 '22 11:10

Alper t. Turker

Related questions
                            
                                multiple .doc to .docx file conversion using python
                            
                                ('Unexpected credentials type', None, 'Expected', 'service_account') with oauth2client (Python)
                            
                                Error 'String or binary data would be truncated' in Microsoft SQL
                            
                                plot Latitude longitude points from dataframe on folium map - iPython
                            
                                Convert python opencv mat image to tensorflow image data
                            
                                numpy index slice with None
                            
                                How to stop daemon thread?
                            
                                AttributeError: 'GridSearchCV' object has no attribute 'cv_results_'
                            
                                python : cannot import name JIRA
                            
                                Django says - No module named 'blog'
                            
                                In python, is a function return a shallow or deep copy?
                            
                                How to batch_get_item many items at once given a list of primary partition key values
                            
                                TypeError: pivot_table() got multiple values for keyword argument 'values'
                            
                                How to construct a case insensitive enum?
                            
                                How to solve the circular import error in django?
                            
                                When run docker-compose up I get python: can't open file 'manage.py': [Errno 2] No such file or directory
                            
                                Dumping a JSON using tab indents (not spaces)
                            
                                pandas split list into columns with regex
                            
                                ImportError: cannot import name pubsub_v1
                            
                                How to check if a python varaible is of type pandas.core.series.Series [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert date to the first day of month in a PySpark Dataframe column?

Tags:

python

apache-spark

apache-spark-sql

pyspark

Rakesh Adhikesavan

People also ask

1 Answers

Alper t. Turker

Recent Activity

Donate For Us