How do I convert this one row to a dataframe? <pre class="prettyprint"><code>val oneRowDF = myDF.first // gives Array[Row] </code></pre> Thanks

In my answer, df1 is a DataFrame [text: string, y : int], just for testing - <code>val df1 = sc.parallelize(List("a", 1")).toDF("text", "y")</code>. <pre class="prettyprint"><code>val schema = StructType( StructField("text", StringType, false) :: StructField("y", IntegerType, false) :: Nil) val arr = df1.head(3); // Array[Row] val dfFromArray = sqlContext.createDataFrame(sparkContext.parallelize(arr), schema); </code></pre> You can also map parallelized array and cast every row: <pre class="prettyprint"><code>val dfFromArray = sparkContext.parallelize(arr).map(row => (row.getString(0), row.getInt(1))) .toDF("text", "y"); </code></pre> In case of one row, you can run: <pre class="prettyprint"><code>val dfFromArray = sparkContext.parallelize(Seq(row)).map(row => (row.getString(0), row.getInt(1))) .toDF("text", "y"); </code></pre> In Spark 2.0 use SparkSession instead of SQLContext.

You do not want to do that : If you want a subpart of the whole dataFrame just use <code>limit</code> api. Example: <pre class="prettyprint"><code>scala> val d=sc.parallelize(Seq((1,3),(2,4))).toDF d: org.apache.spark.sql.DataFrame = [_1: int, _2: int] scala> d.show +---+---+ | _1| _2| +---+---+ | 1| 3| | 2| 4| +---+---+ scala> d.limit(1) res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: int, _2: int] scala> d.limit(1).show +---+---+ | _1| _2| +---+---+ | 1| 3| +---+---+ </code></pre> Still if you want to explicitly convert an Array[Row] to DataFrame , you can do something like <pre class="prettyprint"><code>scala> val value=d.take(1) value: Array[org.apache.spark.sql.Row] = Array([1,3]) scala> val asTuple=value.map(a=>(a.getInt(0),a.getInt(1))) asTuple: Array[(Int, Int)] = Array((1,3)) scala> sc.parallelize(asTuple).toDF res6: org.apache.spark.sql.DataFrame = [_1: int, _2: int] </code></pre> And hence now you can show it accordingly !

If you have <code>List<Row></code>, then it can directly be used to create a <code>dataframe</code> or <code>dataset<Row></code> using <code>spark.createDataFrame(List<Row> rows, StructType schema)</code>. Where spark is SparkSession in spark 2.x

How do I Convert Array[Row] to DataFrame

How do I convert this one row to a dataframe?

val oneRowDF = myDF.first // gives Array[Row]

Thanks

Can we create DataFrame from array?

Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.

How do I turn a row into a DataFrame column?

columns() to Convert Row to Column Header. You can use df. columns=df. iloc[0] to set the column labels by extracting the first row.

How can you convert a NumPy array into a pandas DataFrame?

To convert a numpy array to pandas dataframe, we use pandas. DataFrame() function of Python Pandas library.

In my answer, df1 is a DataFrame [text: string, y : int], just for testing - val df1 = sc.parallelize(List("a", 1")).toDF("text", "y").

val schema = StructType(
    StructField("text", StringType, false) ::
    StructField("y", IntegerType, false) :: Nil)
val arr = df1.head(3); // Array[Row]
val dfFromArray = sqlContext.createDataFrame(sparkContext.parallelize(arr), schema);

You can also map parallelized array and cast every row:

val dfFromArray = sparkContext.parallelize(arr).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In case of one row, you can run:

val dfFromArray = sparkContext.parallelize(Seq(row)).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In Spark 2.0 use SparkSession instead of SQLContext.

You do not want to do that :

If you want a subpart of the whole dataFrame just use limit api.

Example:

scala> val d=sc.parallelize(Seq((1,3),(2,4))).toDF
d: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> d.show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
|  2|  4|
+---+---+


scala> d.limit(1)
res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: int, _2: int]

scala> d.limit(1).show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
+---+---+

Still if you want to explicitly convert an Array[Row] to DataFrame , you can do something like

scala> val value=d.take(1)
value: Array[org.apache.spark.sql.Row] = Array([1,3])

scala> val asTuple=value.map(a=>(a.getInt(0),a.getInt(1)))
asTuple: Array[(Int, Int)] = Array((1,3))

scala> sc.parallelize(asTuple).toDF
res6: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

And hence now you can show it accordingly !

If you have List<Row>, then it can directly be used to create a dataframe or dataset<Row> using spark.createDataFrame(List<Row> rows, StructType schema). Where spark is SparkSession in spark 2.x

How do I Convert Array[Row] to DataFrame

Tags:

dataframe

scala

apache-spark

Garipaso

People also ask

3 Answers

T. Gawęda

Shivansh

Arun Y

Recent Activity

Donate For Us

How do I Convert Array[Row] to DataFrame

Tags:

dataframe

scala

apache-spark

Garipaso

People also ask

3 Answers

T. Gawęda

Shivansh

Arun Y

Related questions

Recent Activity

Donate For Us