Is there any nicer way to prefix or rename all or multiple columns at the same time of a given SparkSQL DataFrame
than calling multiple times dataFrame.withColumnRenamed()
?
An example would be if I want to detect changes (using full outer join). Then I'm left with two DataFrame
s with the same structure.
I suggest to use the select() method to perform this. In fact withColumnRenamed() method uses select() by itself. Here is example how to rename multiple columns:
import org.apache.spark.sql.functions._
val someDataframe: DataFrame = ...
val initialColumnNames = Seq("a", "b", "c")
val renamedColumns = initialColumnNames.map(name => col(name).as(s"renamed_$name"))
someDataframe.select(renamedColumns : _*)
I think this method can help you.
public static Dataset<Row> renameDataFrame(Dataset<Row> dataset) {
for (String column : dataset.columns()) {
dataset = dataset.withColumnRenamed(column, SystemUtils.underscoreToCamelCase(column));
}
return dataset;
}
public static String underscoreToCamelCase(String underscoreName) {
StringBuilder result = new StringBuilder();
if (underscoreName != null && underscoreName.length() > 0) {
boolean flag = false;
for (int i = 0; i < underscoreName.length(); i++) {
char ch = underscoreName.charAt(i);
if ("_".charAt(0) == ch) {
flag = true;
} else {
if (flag) {
result.append(Character.toUpperCase(ch));
flag = false;
} else {
result.append(ch);
}
}
}
}
return result.toString();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With