Suppose I have a DataFrame x
with this schema:
xSchema = StructType([ \
StructField("a", DoubleType(), True), \
StructField("b", DoubleType(), True), \
StructField("c", DoubleType(), True)])
I then have the DataFrame:
DataFrame[a :double, b:double, c:double]
I would like to have an integer derived column. I am able to create a boolean column:
x = x.withColumn('y', (x.a-x.b)/x.c > 1)
My new schema is:
DataFrame[a :double, b:double, c:double, y: boolean]
However, I would like column y
to contain 0 for False and 1 for True.
The cast
function can only operate on a column and not a DataFrame
and the withColumn
function can only operate on a DataFrame
. How to I add a new column and cast it to integer at the same time?
Use the int() class to convert a boolean to an integer, e.g. my_int = int(my_bool) . The int() class will convert True values to 1 and False values to 0 .
You can multiply the return Boolean values (TRUE or FALSE) by 1, and then the TRUE will change to 1, and FALSE to 0.
Use the int() class to convert True and False to 1 and 0, e.g. result = int(True) . The int() class will return 1 for True boolean values and 0 for False values.
Expression you use evaluates to column so you can cast directly like this:
x.withColumn('y', ((x.a-x.b) / x.c > 1).cast('integer')) # Or IntegerType()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With