Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Casting a new derived column in a DataFrame from boolean to integer

Suppose I have a DataFrame x with this schema:

xSchema = StructType([ \
    StructField("a", DoubleType(), True), \
    StructField("b", DoubleType(), True), \
    StructField("c", DoubleType(), True)])

I then have the DataFrame:

DataFrame[a :double, b:double, c:double]

I would like to have an integer derived column. I am able to create a boolean column:

x = x.withColumn('y', (x.a-x.b)/x.c > 1)

My new schema is:

DataFrame[a :double, b:double, c:double, y: boolean]

However, I would like column y to contain 0 for False and 1 for True.

The cast function can only operate on a column and not a DataFrame and the withColumn function can only operate on a DataFrame. How to I add a new column and cast it to integer at the same time?

like image 526
Michal Avatar asked Oct 26 '15 20:10

Michal


People also ask

How do you convert boolean to int in Python?

Use the int() class to convert a boolean to an integer, e.g. my_int = int(my_bool) . The int() class will convert True values to 1 and False values to 0 .

How do you replace true and false with 0 and 1?

You can multiply the return Boolean values (TRUE or FALSE) by 1, and then the TRUE will change to 1, and FALSE to 0.

How do I change from True to 1 in Python?

Use the int() class to convert True and False to 1 and 0, e.g. result = int(True) . The int() class will return 1 for True boolean values and 0 for False values.


1 Answers

Expression you use evaluates to column so you can cast directly like this:

x.withColumn('y', ((x.a-x.b) / x.c > 1).cast('integer')) # Or IntegerType()
like image 181
zero323 Avatar answered Oct 17 '22 18:10

zero323