Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Talend - Combining two rows into one

Tags:

talend

Sample Input

Here is an example of my input. As you can see, the address column has 2 values which I would like to separate and then combine into one value.

Input data with merged cells

Expected Output

This is what the output should be, Combined values into one cell.

Expected output data

Talend Output

If I read the data into Talend it looks like this:

enter image description here

like image 797
jc carmelo Avatar asked Sep 30 '22 03:09

jc carmelo


1 Answers

You should be able to accomplish this by using the tMemorizeRows component in Talend.

A really rough example job might look like:

Job layout

I'm using a tFixedFlowInput to hardcode some data here rather than reading in an Excel Sheet but it should match what you've provided as an example in the question:

Input data hard coded into a tFixedFlowInput component

The tMemorizeRows component keeps a specified amount of rows in memory at all times rather than processing things row by row in a flow as normal (although some components will require the entire data set to be in memory such as with a sort). This can then be accessed as an array. You just want to set this to memorise all of the columns and you only need 2 rows in memory at all times:

tMemorizeRows component set to memorise everything but to only keep 2 rows in memory at a time

In this case you need to pull all of the data from the previous row into the next row when you have an empty name so we can access the data held by the tMemorizeRows component using a tJavaRow using the following example code (quickly hacked together):

String name = "";
String address = input_row.address;
String mailingAddress = input_row.mailing_address;

if ("".equals(input_row.name)) {
    name = name_tMemorizeRows_1[1];
    address = address_tMemorizeRows_1[1] + " " + input_row.address;
    mailingAddress = mailing_address_tMemorizeRows_1[1] + " " + input_row.mailing_address;
} else {
    name = "DELETE THIS ROW";
    address = input_row.address;
    mailingAddress = input_row.mailing_address;
}

output_row.name = name;
output_row.address = address;
output_row.mailing_address = mailingAddress;

Notice how I've set the name for the non empty name rows to "DELETE THIS ROW". I can then use a tFilterRow to remove this row from the flow so we are left with only the output we want:

tFilterRow component to remove the "DELETE THIS ROW" rows

Leaving us with the following output:

.-----------+---------------------------+---------------------.
|                           Output                            |
|=----------+---------------------------+--------------------=|
|name       |address                    |mailing_address      |
|=----------+---------------------------+--------------------=|
|John Carter|Washington Street USA 12345|PO Box 999 USA 12345 |
|Linda Green|London Road UK E20 2ST     |PO Box 998 UK E20 2ST|
'-----------+---------------------------+---------------------'
like image 143
ydaetskcoR Avatar answered Oct 22 '22 14:10

ydaetskcoR