how to create a new columns with random values in pyspark?

1 Answers

Just generate a list of values and then extract them randomly :

from pyspark.sql import functions as F

df.withColumn(
  "business_vertical",
  F.array(
    F.lit("Retail"),
    F.lit("SME"),
    F.lit("Cor"),
  ).getItem(
    (F.rand()*3).cast("int")
  )
)

116

answered Sep 27 '22 23:09

Steven

Related questions
                            
                                Mock an entire module in python
                            
                                How to increase the thickness of error line in a matplotlib bar chart?
                            
                                Mixing cdef and regular python attributes in cdef class
                            
                                Python packaging: subdirectories not installed
                            
                                jupyter notebook import error: no module named 'matplotlib'
                            
                                Check if a large matrix is diagonal matrix in python
                            
                                How to copy a file to a specific folder in a Python script? [duplicate]
                            
                                pycharm type checker expected type dict, got 'None' instead
                            
                                python pandas read_csv not recognizing \t in tab delimited file
                            
                                Python Jupyter - Change default font
                            
                                Shuffle a pandas dataframe by groups
                            
                                Replacing special characters in pandas dataframe
                            
                                how do decorated functions work in flask/python? (app.route)
                            
                                Heroku Upload - Could not find a version that satisfies the requirement anaconda-client==1.4.0
                            
                                Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
                            
                                Python Django error : version GLIBC_PRIVATE not defined
                            
                                Counting number of documents in an index in elasticsearch
                            
                                using pytest --collect-only to only return individual test names
                            
                                Google Colab - Install from GitHub? GLRM
                            
                                How can I get pods by label, using the python kubernetes api?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to create a new columns with random values in pyspark?

Tags:

python

pandas

pyspark

subash poudel

People also ask

1 Answers

Steven

Recent Activity

Donate For Us