I have a set of 4 columned csv data where records have same value for the first column for 5 rows.Then again the value remains same for the first column for the next 5 rows and so on.
Sample data:
a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
a,24,54,xxx
a,25,55,xxx
b,21,61,yyy
b,22,62,yyy
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........
But sometimes the records come in arbitrarily as:
a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
b,21,61,yyy
b,22,62,yyy
a,24,54,xxx
a,25,55,xxx
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........
Is there any way of grouping such data based on its first column using NiFi processors?
Any answers would be helpful.
Thanks
You should be able to do this with the RouteText processor using the Grouping Regular Expression, which says:
"Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the Group from all Capturing Groups. Two lines will not be placed into the same FlowFile unless the they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.?),.". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile."
I think you can use that in conjunction with a Matching Strategy of Matches Regular Expression and just use .* for that expression so that every line matches.
Then for the grouping expression use the example above to group by the first column (.?),.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With