Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping csv data using NiFi processors

I have a set of 4 columned csv data where records have same value for the first column for 5 rows.Then again the value remains same for the first column for the next 5 rows and so on.

Sample data:

a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
a,24,54,xxx
a,25,55,xxx
b,21,61,yyy
b,22,62,yyy
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........

But sometimes the records come in arbitrarily as:

a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
b,21,61,yyy
b,22,62,yyy
a,24,54,xxx
a,25,55,xxx
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........

Is there any way of grouping such data based on its first column using NiFi processors?

Any answers would be helpful.

Thanks

like image 596
R.Sangeetha Avatar asked Apr 20 '26 04:04

R.Sangeetha


1 Answers

You should be able to do this with the RouteText processor using the Grouping Regular Expression, which says:

"Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the Group from all Capturing Groups. Two lines will not be placed into the same FlowFile unless the they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.?),.". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile."

I think you can use that in conjunction with a Matching Strategy of Matches Regular Expression and just use .* for that expression so that every line matches.

Then for the grouping expression use the example above to group by the first column (.?),.

like image 184
Bryan Bende Avatar answered Apr 23 '26 02:04

Bryan Bende