Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a Row from a List or Array in Spark using java

In Java, I use RowFactory.create() to create a Row:

Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));

where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to use a List or an Array to create the "row". In Scala, I can use Row.fromSeq() to create a Row from a List or an Array, but how can I achieve that in Java?

like image 472
user2736706 Avatar asked Sep 26 '16 06:09

user2736706


People also ask

How do I create a row in spark?

To create a new Row, use RowFactory. create() in Java or Row. apply() in Scala. A Row object can be constructed by providing field values.

What is row type in spark?

A row in Spark is an ordered collection of fields that can be accessed starting at index 0. The row is a generic object of type Row . Columns making up the row can be of the same or different types.

How do you create a row in PySpark?

Row class in PySpark is used to create Row for the PySpark DataFrame. We can create a Row by using the Row() function. This is available in the pyspark. sql module.

How to create array or map columns to rows in spark?

Spark function explode (e: Column) is used to explode or create array or map columns to rows. When an array is passed to this function, it creates a new default column “col1” and it contains all array elements. When a map is passed, it creates two new columns one for key and one for value and each element in map split into the row.

How to create a Dataframe from a list in spark?

First populate the list with row object and then we create the structfield and add it to the list. Pass the list into the createStructType function and pass this into the createDataFrame function. SparkSession spark = SparkSession.builder().

How do you explode an array in spark?

explode – spark explode array or map column to rows Spark function explode (e: Column) is used to explode or create array or map columns to rows. When an array is passed to this function, it creates a new default column “col1” and it contains all array elements.

How to slice an array into sub-array in spark?

This function slices the array into a sub-array. We can specify the start of the index as second argument and number of elements as third argument. Note: Arrays in spark start with index 1. It also supports negative indexing to access the elements from last. Let’s try to create a sub-array of 3 elements starting from index 2.


2 Answers

We often need to create Datasets or Dataframes in real world applications. Here is an example of how to create Rows and Dataset in a Java application:

// initialize first SQLContext
SQLContext sqlContext = ... 
StructType schemata = DataTypes.createStructType(
        new StructField[]{
                createStructField("NAME", StringType, false),
                createStructField("STRING_VALUE", StringType, false),
                createStructField("NUM_VALUE", IntegerType, false),
        });
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
+-----+------------+---------+
| NAME|STRING_VALUE|NUM_VALUE|
+-----+------------+---------+
|name1|      value1|        1|
|name2|      value2|        2|
+-----+------------+---------+
like image 175
Andrushenko Alexander Avatar answered Sep 23 '22 18:09

Andrushenko Alexander


I am not sure if I get your question correctly but you can use the RowFactory to create Row from ArrayList in java.

List<MyData> mlist = new ArrayList<MyData>();
    mlist.add(d1);
    mlist.add(d2);

Row row = RowFactory.create(mlist.toArray());   
like image 31
abaghel Avatar answered Sep 21 '22 18:09

abaghel