Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read spark dataset only first n columns

I have a dataset with more than 5000 columns and OutOfMemoryException was thrown when try to read the dataset, even when limiting to 10 rows. There is another post on the cause of exception and so I want to read only first n columns to avoid the error. I could not find an api call that does that and only the rows could be restricted with head or limit. Is there a way to do restricting to only first few columns? Thanks.

like image 260
Senthil Avatar asked Mar 07 '26 23:03

Senthil


1 Answers

Given that your Dataset is ds, you can extract the first n columns into an Array :

val n = 2
val firstNCols = ds.columns.take(n)

and then select only these columns from the Dataset :

ds.select(firstNCols.head, firstNCols.tail:_*)
like image 144
cheseaux Avatar answered Mar 09 '26 13:03

cheseaux



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!