Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use global variable in pyspark function

Firstly I have two variable at begining of code.

numericColumnNames = []
categoricalColumnsNames = [];

Then in main method , I assign value to those values

def main():
  #clickRDD = sc.textFile("s3a://wer-display-ads/day_0_1000.csv"); 
  clickRDD = sc.textFile("data/day_0_1000.csv");
  numericColumnNames , categoricalColumnsNames = getColumnStructure();

Then When I want to use those variables in following function , Those variables are not updated and are empty

def dataToVectorForLinear(clickDF):
  print (categoricalColumnsNames) ## why this list is empty 
  clickDF = oneHotEncoding(clickDF,categoricalColumnsNames)

Unfortunetly I can't find the problem? Thanks for your help

like image 418
yunus kula Avatar asked Mar 24 '26 13:03

yunus kula


1 Answers

Just re-initialize them inside the function 'global` keyword like this

def main():

    global numericColumnNames
    global categoricalColumnsNames     

    clickRDD = sc.textFile("data/day_0_1000.csv");
    numericColumnNames , categoricalColumnsNames = getColumnStructure();

Similarly

def dataToVectorForLinear(clickDF):

    global categoricalColumnsNames
    print (categoricalColumnsNames) 
    clickDF = oneHotEncoding(clickDF,categoricalColumnsNames)

Reference:

  • Global and local variables in Python
like image 111
Gambit1614 Avatar answered Mar 26 '26 06:03

Gambit1614



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!