My RDD might have columns with constant value. In other words, the variance of some of the columns may be zero. My objective is to remove all such columns from the RDD (and ultimately compute the covariance matrix for the remaining columns). How can I do that?
Thanks and regards,
An RDD is supposed to be immutable. So I don't think you want to remove something from it, but just map
it to something that suits you and/or filter
something out (more details in the documentation).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With