I have data with rows like the following:
Name1 Name2 Name3 Col
aaa bbb ccc ...
abc ddd ddd 1
abc ddd ddd 2
abc ddd ddd 3
fff fff fff ...
ggg ggg hhh 4
ggg ggg hhh 5
(Name1
, Name2
and Name3
are primary keys)
How do I remove the first rows from the set of data with the same 3 primary keys? (leaving only the last row of the set)
I.e. the result from the above would be:
Name1 Name2 Name3 Col
aaa bbb ccc ...
abc ddd ddd 3
fff fff fff ...
ggg ggg hhh 5
Assuming your source data is in the correct order, and you want the last record in each set, there aren't any out-of-the-box transformations that would handle this situation. However, a Script Transformation can deal with it fairly easily.
Here's a sample Data Flow:
I'm using FF_SRC_AllRows
and FF_DST_SelectedRows
as flat file source and destination (respectively) for simplicity's sake, using the sample data you provided; your specific needs will vary. The script transformation SCR_SelectLastRow
is configured as a transformation (both input and output):
Select all your input columns (with usage type ReadOnly
):
Create one output (I named mine OutgoingRows
, but you can name it whatever you wish), and set the SynchronousInputID
property to None
. This will let your script filter out the rows you don't want.
Add output columns corresponding to the input columns:
And use code along these lines:
/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
using System;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
class IncomingRowData
{
public string Name1;
public string Name2;
public string Name3;
public string Col;
public IncomingRowData(IncomingRowsBuffer Row)
{
Name1 = Row.Name1;
Name2 = Row.Name2;
Name3 = Row.Name3;
Col = Row.Col;
}
public bool KeysDiffer(IncomingRowData other)
{
return (Name1 != other.Name1
|| Name2 != other.Name2
|| Name3 != other.Name3);
}
public void WriteToOutputBuffer(OutgoingRowsBuffer Row)
{
Row.AddRow();
Row.Name1 = Name1;
Row.Name2 = Name2;
Row.Name3 = Name3;
Row.Col = Col;
}
}
private IncomingRowData _previousRow;
public override void IncomingRows_ProcessInputRow(IncomingRowsBuffer Row)
{
if (_previousRow == null)
{
_previousRow = new IncomingRowData(Row);
}
IncomingRowData currentRow = new IncomingRowData(Row);
if (currentRow.KeysDiffer(_previousRow))
{
_previousRow.WriteToOutputBuffer(this.OutgoingRowsBuffer);
}
_previousRow = currentRow;
}
public override void FinishOutputs()
{
if (_previousRow != null)
{
_previousRow.WriteToOutputBuffer(this.OutgoingRowsBuffer);
}
base.FinishOutputs();
}
}
One nice thing about this technique is that it allows you to handle the data in a single pass and requires neither the use of staging tables nor keeping the entire source data set in memory. Depending on how big your data set is, either of those could cause significant performance issues.
Recommendation #1: Do this in your source query if at all possible.
Assuming that's not possible, and assuming that you always want the maximum value for Col to be selected, you can use the Aggregate component in your data flow.
Just add all of your columns to the aggregate input, and for the operations choose "Group By" for Name1, Name2, and Name3, and "Maximum" for Col.
Unfortunately, the aggregate component is an asynchronous component - meaning your entire flow will pause while the data flows into it, because it won't know the "maximum" value of each set until it's read in every row.
SELECT name1,
name2,
name3,
col
FROM (SELECT name1,
name2,
name3,
col,
Max(rn)
over (
PARTITION BY name1, name2, name3 ) AS max_rn,
rn
FROM (SELECT name1,
name2,
name3,
col,
Row_number()
over (
PARTITION BY name1, name2, name3
ORDER BY col ) AS rn
FROM test1))
WHERE max_rn = rn;
You can try this where test1 is tablename
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With