In Java, I use RowFactory.create() to create a Row:
Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));
where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to use a List or an Array to create the "row". In Scala, I can use Row.fromSeq() to create a Row from a List or an Array, but how can I achieve that in Java?
To create a new Row, use RowFactory. create() in Java or Row. apply() in Scala. A Row object can be constructed by providing field values.
A row in Spark is an ordered collection of fields that can be accessed starting at index 0. The row is a generic object of type Row . Columns making up the row can be of the same or different types.
Row class in PySpark is used to create Row for the PySpark DataFrame. We can create a Row by using the Row() function. This is available in the pyspark. sql module.
Spark function explode (e: Column) is used to explode or create array or map columns to rows. When an array is passed to this function, it creates a new default column “col1” and it contains all array elements. When a map is passed, it creates two new columns one for key and one for value and each element in map split into the row.
First populate the list with row object and then we create the structfield and add it to the list. Pass the list into the createStructType function and pass this into the createDataFrame function. SparkSession spark = SparkSession.builder().
explode – spark explode array or map column to rows Spark function explode (e: Column) is used to explode or create array or map columns to rows. When an array is passed to this function, it creates a new default column “col1” and it contains all array elements.
This function slices the array into a sub-array. We can specify the start of the index as second argument and number of elements as third argument. Note: Arrays in spark start with index 1. It also supports negative indexing to access the elements from last. Let’s try to create a sub-array of 3 elements starting from index 2.
We often need to create Datasets or Dataframes in real world applications. Here is an example of how to create Rows and Dataset in a Java application:
// initialize first SQLContext
SQLContext sqlContext = ...
StructType schemata = DataTypes.createStructType(
new StructField[]{
createStructField("NAME", StringType, false),
createStructField("STRING_VALUE", StringType, false),
createStructField("NUM_VALUE", IntegerType, false),
});
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
+-----+------------+---------+
| NAME|STRING_VALUE|NUM_VALUE|
+-----+------------+---------+
|name1| value1| 1|
|name2| value2| 2|
+-----+------------+---------+
I am not sure if I get your question correctly but you can use the RowFactory to create Row from ArrayList in java.
List<MyData> mlist = new ArrayList<MyData>();
mlist.add(d1);
mlist.add(d2);
Row row = RowFactory.create(mlist.toArray());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With