Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I Convert Array[Row] to DataFrame

How do I convert this one row to a dataframe?

val oneRowDF = myDF.first // gives Array[Row]

Thanks

like image 206
Garipaso Avatar asked Nov 25 '16 08:11

Garipaso


People also ask

Can we create DataFrame from array?

Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.

How do I turn a row into a DataFrame column?

columns() to Convert Row to Column Header. You can use df. columns=df. iloc[0] to set the column labels by extracting the first row.

How can you convert a NumPy array into a pandas DataFrame?

To convert a numpy array to pandas dataframe, we use pandas. DataFrame() function of Python Pandas library.


3 Answers

In my answer, df1 is a DataFrame [text: string, y : int], just for testing - val df1 = sc.parallelize(List("a", 1")).toDF("text", "y").

val schema = StructType(
    StructField("text", StringType, false) ::
    StructField("y", IntegerType, false) :: Nil)
val arr = df1.head(3); // Array[Row]
val dfFromArray = sqlContext.createDataFrame(sparkContext.parallelize(arr), schema);

You can also map parallelized array and cast every row:

val dfFromArray = sparkContext.parallelize(arr).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In case of one row, you can run:

val dfFromArray = sparkContext.parallelize(Seq(row)).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In Spark 2.0 use SparkSession instead of SQLContext.

like image 75
T. Gawęda Avatar answered Oct 27 '22 04:10

T. Gawęda


You do not want to do that :

If you want a subpart of the whole dataFrame just use limit api.

Example:

scala> val d=sc.parallelize(Seq((1,3),(2,4))).toDF
d: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> d.show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
|  2|  4|
+---+---+


scala> d.limit(1)
res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: int, _2: int]

scala> d.limit(1).show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
+---+---+

Still if you want to explicitly convert an Array[Row] to DataFrame , you can do something like

scala> val value=d.take(1)
value: Array[org.apache.spark.sql.Row] = Array([1,3])

scala> val asTuple=value.map(a=>(a.getInt(0),a.getInt(1)))
asTuple: Array[(Int, Int)] = Array((1,3))

scala> sc.parallelize(asTuple).toDF
res6: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

And hence now you can show it accordingly !

like image 42
Shivansh Avatar answered Oct 27 '22 04:10

Shivansh


If you have List<Row>, then it can directly be used to create a dataframe or dataset<Row> using spark.createDataFrame(List<Row> rows, StructType schema). Where spark is SparkSession in spark 2.x

like image 4
Arun Y Avatar answered Oct 27 '22 05:10

Arun Y