In the Row Java API there is a row.schema(), however there is not a row.set(StructType schema).
Also I tried to RowFactorie.create(objets), but I don't know how to proceed
UPDATE:
The problems is how to generate a new dataframe when I modify the structure in workers I put the example
DataFrame sentenceData = jsql.createDataFrame(jrdd, schema);
List<Row> resultRows2 = sentenceData.toJavaRDD()
.map(new MyFunction<Row, Row>(parameters) {
/** my map function **//
public Row call(Row row) {
// I want to change Row definition adding new columns
Row newRow = functionAddnewNewColumns (row);
StructType newSchema = functionGetNewSchema (row.schema);
// Here I want to insert the structure
//
return newRow
}
}
}).collect();
JavaRDD<Row> jrdd = jsc.parallelize(resultRows);
// Here is the problema I don't know how to get the new schema to create the new modified dataframe
DataFrame newDataframe = jsql.createDataFrame(jrdd, newSchema);
You can create a row with Schema by using:
Row newRow = new GenericRowWithSchema(values, newSchema);
You do not set a schema on a row - that makes no sense. You can, however, create a DataFrame
(or pre-Spark 1.3 a JavaSchemaRDD) with a given schema using the sqlContext.
DataFrame dataFrame = sqlContext.createDataFrame(rowRDD, schema)
The dataframe
will have the schema, you have provided.
For further information, please consult the documentation at http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
EDIT: According to updated question
Your can generate new rows in your map
-function which will get you a new rdd
of type JavaRDD<Row>
DataFrame sentenceData = jsql.createDataFrame(jrdd, schema);
JavaRDD<Row> newRowRDD = sentenceData
.toJavaRDD()
.map(row -> functionAddnewNewColumns(row)) // Assuming functionAddnewNewColumns returns a Row
You then define the new schema
StructField[] fields = new StructField[] {
new StructField("column1",...),
new StructField("column2",...),
...
};
StructType newSchema = new StructType(fields);
Create a new DataFrame
from your rowRDD
with newSchema
as schema
DataFrame newDataframe = jsql.createDataFrame(newRowRDD, newSchema)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With