I have a table that contains 4 columns and in the 5th column I want to store the count of how many non-null columns there are out of the previous 4. For example:
Where X is any value:
Column1 | Column2 | Column3 | Column4 | Count X | X | NULL | X | 3 NULL | NULL | X | X | 2 NULL | NULL | NULL | NULL | 0
Solution: In order to find non-null values of PySpark DataFrame columns, we need to use negate of isNotNull() function for example ~df. name. isNotNull() similarly for non-nan values ~isnan(df.name) .
COUNT does not include NULL values in column counts. Therefore, the number of return values for each column might differ or be less than the total number of rows returned by COUNT(*).
select T.Column1, T.Column2, T.Column3, T.Column4, ( select count(*) from (values (T.Column1), (T.Column2), (T.Column3), (T.Column4)) as v(col) where v.col is not null ) as Column5 from Table1 as T
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With