Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conflating Java streams

I have very big Stream of versioned documents ordered by document id and version.

E.g. Av1, Av2, Bv1, Cv1, Cv2

I have to convert this into another Stream whose records are aggregated by document id.

A[v1, v2], B[v1], C[v1, V2]

Can this be done without using Collectors.groupBy()? I don't want to use groupBy() because it will load all items in the stream into memory before grouping them. In theory, one need not load the whole stream in memory because it is ordered.

like image 837
sgp15 Avatar asked Apr 12 '19 08:04

sgp15


1 Answers

Here's a solution I came up with:

    Stream<Document> stream = Stream.of(
            new Document("A", "v1"),
            new Document("A", "v2"),
            new Document("B", "v1"),
            new Document("C", "v1"),
            new Document("C", "v2")
    );

    Iterator<Document> iterator = stream.iterator();
    Stream<GroupedDocument> result = Stream.generate(new Supplier<GroupedDocument>() {

        Document lastDoc = null;
        @Override
        public GroupedDocument get() {
            try {
                Document doc = Optional.ofNullable(lastDoc).orElseGet(iterator::next);

                String id = doc.getId();
                GroupedDocument gd = new GroupedDocument(doc.getId());
                gd.getVersions().add(doc.getVersion());

                if (!iterator.hasNext()) {
                    return null;
                }

                while (iterator.hasNext() && (doc = iterator.next()).getId().equals(id)) {
                    gd.getVersions().add(doc.getVersion());
                }
                lastDoc = doc;
                return gd;
            } catch (NoSuchElementException ex) {
                return null;
            }
        }
    });

Here are the Document and GroupedDocument classes:

class Document {
    private String id;
    private String version;

    public Document(String id, String version) {
        this.id = id;
        this.version = version;
    }

    public String getId() {
        return id;
    }

    public String getVersion() {
        return version;
    }
}

class GroupedDocument {
    private String id;
    private List<String> versions;

    public GroupedDocument(String id) {
        this.id = id;
        versions = new ArrayList<>();
    }

    public String getId() {
        return id;
    }

    public List<String> getVersions() {
        return versions;
    }

    @Override
    public String toString() {
        return "GroupedDocument{" +
                "id='" + id + '\'' +
                ", versions=" + versions +
                '}';
    }
}

Note that the resulting stream is an infinite stream. After all the groups there will be an infinite number of nulls. You can take all the elements that are not null by using takeWhile in Java 9, or see this post.

like image 185
Sweeper Avatar answered Nov 17 '22 07:11

Sweeper