I have a use case where I scrape some data, and for some records some keys have multiple values. The final output I want is CSV, which I have a library for, and it expects a 2-dimensional array.
So my input structure looks like List<TreeMap<String, List<String>>>
(I use TreeMap
to ensure stable key order), and my output needs to be String[][]
.
I wrote a generic transformation which calculates the number of columns for each key based on max number of values among all records, and leaves empty cells for records that have less than max values, but it turned out more complex than expected.
My question is: can it be written in a more concise/effective (but still generic) way? Especially using Java 8 streams/lambdas etc.?
Sample data and my algorithm follows below (not tested beyond sample data yet):
package org.example.import;
import java.util.*;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<TreeMap<String, List<String>>> rows = new ArrayList<>();
TreeMap<String, List<String>> row1 = new TreeMap<>();
row1.put("Title", Arrays.asList("Product 1"));
row1.put("Category", Arrays.asList("Wireless", "Sensor"));
row1.put("Price",Arrays.asList("20"));
rows.add(row1);
TreeMap<String, List<String>> row2 = new TreeMap<>();
row2.put("Title", Arrays.asList("Product 2"));
row2.put("Category", Arrays.asList("Sensor"));
row2.put("Price",Arrays.asList("35"));
rows.add(row2);
TreeMap<String, List<String>> row3 = new TreeMap<>();
row3.put("Title", Arrays.asList("Product 3"));
row3.put("Price",Arrays.asList("15"));
rows.add(row3);
System.out.println("Input:");
System.out.println(rows);
System.out.println("Output:");
System.out.println(Arrays.deepToString(multiValueListsToArray(rows)));
}
public static String[][] multiValueListsToArray(List<TreeMap<String, List<String>>> rows)
{
Map<String, IntSummaryStatistics> colWidths = rows.
stream().
flatMap(m -> m.entrySet().stream()).
collect(Collectors.groupingBy(e -> e.getKey(), Collectors.summarizingInt(e -> e.getValue().size())));
Long tableWidth = colWidths.values().stream().mapToLong(IntSummaryStatistics::getMax).sum();
String[][] array = new String[rows.size()][tableWidth.intValue()];
Iterator<TreeMap<String, List<String>>> rowIt = rows.iterator(); // iterate rows
int rowIdx = 0;
while (rowIt.hasNext())
{
TreeMap<String, List<String>> row = rowIt.next();
Iterator<String> colIt = colWidths.keySet().iterator(); // iterate columns
int cellIdx = 0;
while (colIt.hasNext())
{
String col = colIt.next();
long colWidth = colWidths.get(col).getMax();
for (int i = 0; i < colWidth; i++) // iterate cells within column
if (row.containsKey(col) && row.get(col).size() > i)
array[rowIdx][cellIdx + i] = row.get(col).get(i);
cellIdx += colWidth;
}
rowIdx++;
}
return array;
}
}
Program output:
Input:
[{Category=[Wireless, Sensor], Price=[20], Title=[Product 1]}, {Category=[Sensor], Price=[35], Title=[Product 2]}, {Price=[15], Title=[Product 3]}]
Output:
[[Wireless, Sensor, 20, Product 1], [Sensor, null, 35, Product 2], [null, null, 15, Product 3]]
Use Object#toString() . String string = map. toString(); That's after all also what System.
Pass the List<String> as a parameter to the constructor of a new ArrayList<Object> . List<Object> objectList = new ArrayList<Object>(stringList);
We use the toString() method of the list to convert the list into a string.
As a first step, I wouldn’t focus on new Java 8 features, but rather Java 5+ features. Don’t deal with Iterator
s when you can use for-each. Generally, don’t iterate over a keySet()
to perform a map lookup for every key, as you can iterate over the entrySet()
not requiring any lookup. Also, don’t ask for an IntSummaryStatistics
when you’re only interested in the maximum value. And don’t iterate over the bigger of two data structures, just to recheck that you’re not beyond the smaller one in each iteration.
Map<String, Integer> colWidths = rows.
stream().
flatMap(m -> m.entrySet().stream()).
collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(), Integer::max));
int tableWidth = colWidths.values().stream().mapToInt(Integer::intValue).sum();
String[][] array = new String[rows.size()][tableWidth];
int rowIdx = 0;
for(TreeMap<String, List<String>> row: rows) {
int cellIdx = 0;
for(Map.Entry<String,Integer> e: colWidths.entrySet()) {
String col = e.getKey();
List<String> cells = row.get(col);
int index = cellIdx;
if(cells != null) for(String s: cells) array[rowIdx][index++]=s;
cellIdx += colWidths.get(col);
}
rowIdx++;
}
return array;
We can simplify the loop further by using a map to column positions rather than widths:
Map<String, Integer> colPositions = rows.
stream().
flatMap(m -> m.entrySet().stream()).
collect(Collectors.toMap(e -> e.getKey(),
e -> e.getValue().size(), Integer::max, TreeMap::new));
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
tableWidth += e.setValue(tableWidth);
String[][] array = new String[rows.size()][tableWidth];
int rowIdx = 0;
for(Map<String, List<String>> row: rows) {
for(Map.Entry<String,List<String>> e: row.entrySet()) {
int index = colPositions.get(e.getKey());
for(String s: e.getValue()) array[rowIdx][index++]=s;
}
rowIdx++;
}
return array;
A header array can be prepended with the following change:
Map<String, Integer> colPositions = rows.stream()
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(),
Integer::max, TreeMap::new));
String[] header = colPositions.entrySet().stream()
.flatMap(e -> Collections.nCopies(e.getValue(), e.getKey()).stream())
.toArray(String[]::new);
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
tableWidth += e.setValue(tableWidth);
String[][] array = new String[rows.size()+1][tableWidth];
array[0] = header;
int rowIdx = 1;
for(Map<String, List<String>> row: rows) {
for(Map.Entry<String,List<String>> e: row.entrySet()) {
int index = colPositions.get(e.getKey());
for(String s: e.getValue()) array[rowIdx][index++]=s;
}
rowIdx++;
}
return array;
This is quite concise way to do it using some java-8 features.
This solution assumes that only the Category data is dynamic, whereas you will have always only one price and one product name.
Considering you have the initial data
// your initial complex data list
List<Map<String, List<String>>> initialList = new ArrayList<>();
you can do
// values holder before final conversion
final List<List<String>> tempValues = new ArrayList<>();
initialList.forEach( map -> {
// discard the keys, we do not need them... so only pack the data and put in a temporary array
tempValues.add(new ArrayList<String>() {{
map.forEach((key, value) -> addAll(value)); // foreach (string, list) : Map<String, List<String>>
}});
});
// get the biggest data list; in our case, the one that contains most categories...
// this is going to be the final data size
final int maxSize = tempValues.stream().max(Comparator.comparingInt(List::size)).get().size();
// now we finally know the data size
final String[][] finalValues = new String[initialList.size()][maxSize];
// now it's time to uniform the bundle data size and shift the elements if necessary
// can't use streams/lambda as I need to keep an iteration counter
for (int i = 0; i < tempValues.size(); i++) {
final List<String> tempEntry = tempValues.get(i);
if (tempEntry.size() == maxSize) {
finalValues[i] = tempEntry.toArray(finalValues[i]);
continue;
}
final String[] s = new String[maxSize];
// same shifting game as before
final int delta = maxSize - tempEntry.size();
for (int j = 0; j < maxSize; j++) {
if (j < delta) continue;
s[j] = tempEntry.get(j - delta);
}
finalValues[i] = s;
}
and that's it...
You can fill and test the data with this method below (I have added some more categories...)
static void initData(List<Map<String, List<String>>> l) {
l.add(new TreeMap<String, List<String>>() {{
put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); }});
put("Price", new ArrayList<String>() {{ add("20"); }});
put("Title", new ArrayList<String>() {{ add("Product 1"); }});
}});
l.add(new TreeMap<String, List<String>>() {{
put("Category", new ArrayList<String>() {{ add("Sensor"); }});
put("Price", new ArrayList<String>() {{ add("35"); }});
put("Title", new ArrayList<String>() {{ add("Product 2"); }});
}});
l.add(new TreeMap<String, List<String>>() {{
put("Price", new ArrayList<String>() {{ add("15"); }});
put("Title", new ArrayList<String>() {{ add("Product 3"); }});
}});
l.add(new TreeMap<String, List<String>>() {{
put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category14"); }});
put("Price", new ArrayList<String>() {{ add("15"); }});
put("Title", new ArrayList<String>() {{ add("Product 3"); }});
}});
l.add(new TreeMap<String, List<String>>() {{
put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category541"); add("SomeCategory");}});
put("Price", new ArrayList<String>() {{ add("15"); }});
put("Title", new ArrayList<String>() {{ add("Product 3"); }});
}});
}
I'd still say, the accepted answer looks less computationally expansive, but you wanted to see some Java 8...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With