Now I have 300+ columns in my RDD, but I found there is a need to dynamically select a range of columns and put them into LabledPoints data type. As a newbie to Spark, I am wondering if there is any index way to select a range of columns in RDD. Something like temp_data = data[, 101:211]
in R. Is there something like val temp_data = data.filter(_.column_index in range(101:211)...
?
Any thought is welcomed and appreciated.
You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
RDD- Through RDD, we can process structured as well as unstructured data. But, in RDD user need to specify the schema of ingested data, RDD cannot infer its own. DataFrame- In data frame data is organized into named columns.
If it is a DataFrame, then something like this should work:
val df = rdd.toDF
df.select(df.columns.slice(101,211) : _*)
Assuming you have an RDD of Array
or any other scala collection (e.g., List
). You can do something like this:
val data: RDD[Array[Int]] = sc.parallelize(Array(Array(1,2,3), Array(4,5,6)))
val sliced: RDD[Array[Int]] = data.map(_.slice(0,2))
sliced.collect()
> Array[Array[Int]] = Array(Array(1, 2), Array(4, 5))
Kind of old thread, but I recently had to do something similar and search around. I needed to select all but the last column where I had 200+ columns.
Spark 1.4.1
Scala 2.10.4
val df = hiveContext.sql("SELECT * FROM foobar")
val cols = df.columns.slice(0, df.columns.length - 1)
val new_df = df.select(cols.head, cols.tail:_*)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With