How to load mixed record type fixed width file with two headers into two separate files

Tags:

I got a task to load a strangely formatted text file. The file contains unwanted data too. It contains two headers back to back and data for each header is specified on alternate lines. Header rows start after ------. I need to read both the header along with its corresponding data and dump it into some Excel/table destination using. Let me know how to solve this using any transformation in SSIS or maybe with a script. Don't know how to use a script task for this.

Right now I am reading the file in one column and using a derived column manually trying to split it using substring function. But that works for only one header and it is too hard coded type. I need some dynamic approach to read header rows as well as data rows directly.

Input file:

A1234-012                                         I N F O R M A T I C S  C O M P A N Y                                      08/23/17
PAGE    2 BATCH ABC                                           PAYMENT DATE & DUE DATE                                 EDIT PAGE  481
------------------------------------------------------------------------------------------------------------------------------------
 SEO  XRAT CLT     LOAN      OPENING  PAYMENT MATURIUH LOAN NEXE ORIG-AMT   OFF TO CATE  CONTC MON NO.TO  TOL NEL   S CUP CO IND PAT
 NOM  CODE NOM    NOMTER      DATE    DUO DATE  DATE   TIME PT #  MONEY         AQ LOAN  NUMBER    BLOCK   PAYMENT  U TYP GH OMG IND
       1-3 4-6  7-13/90-102  14-19    20-25     26-31 32-34 35-37 38-46   47-48 49 50-51 52-61  62  63      64-72  73 4-5 76 77 8-80
------------------------------------------------------------------------------------------------------------------------------------
 SEO  XRAT CLT     LOAN     A/C   A/C     MIN     MAX    MAX   PENDI  LATE CCH  L/F  PARTLYS  CUR   L/F      L/F     L/F
 NOM  CODE NOM    NOMTER    CODE FACTOR   MON     MON    ROAD   DAYS  MONE POT  L/A  L/F JAC  INT  VAD CD  USED PI  VAD DT
       1-3 4-6  7-13/90-102  14  15      20-23   24-29   30-34 35-37   38-42    43     44     49     60     61-63    64-69
USED-ID:
------------------------------------------------------------------------------------------------------------------------------------
454542 070 567    2136547895 08-08-18 08-06-18 11-02-18 123 256      62,222  LK  5 55 5463218975 5 3       5,555.22    33        H55
025641 055 123    5144511352 B  .55321   2.55 6531.22                                                 H      #AS
454542 070 567    2136547895 08-08-18 08-06-18 11-02-18 123 256      62,222  LK  5 55 5463218975 5 3       5,555.22    33        H55
025641 055 123    5144511352 B  .55321   2.55 6531.22                                                 H      #AS
454542 070 567    2136547895 08-08-18 08-06-18 11-02-18 123 256      62,222  LK  5 55 5463218975 5 3       5,555.22    33        H55
025641 055 123    5144511352 B  .55321   2.55 6531.22                                                 H      #AS

Expected output should be:

FILE 1:

 SEO  XRAT CLT     LOAN      OPENING  PAYMENT MATURIUH LOAN NEXE ORIG-AMT   OFF TO CATE  CONTC MON NO.TO  TOL NEL   S CUP CO IND PAT
 NOM  CODE NOM    NOMTER      DATE    DUO DATE  DATE   TIME PT #  MONEY         AQ LOAN  NUMBER    BLOCK   PAYMENT  U TYP GH OMG IND
454542 070 567    2136547895 08-08-18 08-06-18 11-02-18 123 256      62,222  LK  5 55 5463218975 5 3       5,555.22    33        H55
454542 070 567    2136547895 08-08-18 08-06-18 11-02-18 123 256      62,222  LK  5 55 5463218975 5 3       5,555.22    33        H55
454542 070 567    2136547895 08-08-18 08-06-18 11-02-18 123 256      62,222  LK  5 55 5463218975 5 3       5,555.22    33        H55

FILE 2:

 SEO  XRAT CLT     LOAN     A/C   A/C     MIN     MAX    MAX   PENDI  LATE CCH  L/F  PARTLYS  CUR   L/F      L/F     L/F
 NOM  CODE NOM    NOMTER    CODE FACTOR   MON     MON    ROAD   DAYS  MONE POT  L/A  L/F JAC  INT  VAD CD  USED PI  VAD DT
025641 055 123    5144511352 B  .55321   2.55 6531.22                                                 H      #AS
025641 055 123    5144511352 B  .55321   2.55 6531.22                                                 H      #AS
025641 055 123    5144511352 B  .55321   2.55 6531.22                                                 H      #AS

Output:File 1 Output:File 2 sample data

218

asked Apr 16 '19 15:04

user6423884

1 Answers

Ignore first 3 rows

To ignore first 3 rows you can simply configure the flat file connection manager to ignore them, similar to:

enter image description here

Split file and remove bad rows

1. Configure connection managers

In addition, in the flat file connection manager, go to the advanced tab and delete all columns except one and change its data type to DT_STR and the MaxLength to 4000.

enter image description here

Add two connection managers , one for each destination file where you must define only one column with max length = 4000:

enter image description here

2. Configure Data flow task

Add a Data Flow Task, And add a Flat File Source inside. Select the Source File connection manager.

Add a conditional split with the following expressions:

File1

FINDSTRING([Column 0],"OPENING",1) > 1 || FINDSTRING([Column 0],"DATE",1) > 1 || TOKENCOUNT([Column 0]," ") == 19

File2

FINDSTRING([Column 0],"A/C",1) > 1 || FINDSTRING([Column 0],"FACTOR",1) > 1 || TOKENCOUNT([Column 0]," ") == 10

enter image description here

The expressions above are created based on the expected output you mentioned in the question, i tired to search for unique keywords inside each header and splitted the data rows based on the number of space occurrence.

Finally Map each output to a destination flat file component:

enter image description here

Experiments

The execution result is shown in the following screenshots:

enter image description here

Update 1 - Remove duplicates

To remove duplicates you must you can refer to the following link:

How to remove duplicate rows from flat file using SSIS?

Update 2 - Remove only duplicates headers + Replace spaces with Tab

If you need only to remove duplicate headers then you can do this in two steps:

Add a script component after each conditional split output to flag unwanted rows
Add a conditional split to filter rows based on the script component output

In addition, because the columns values does not contains spaces you can use regular expression to replace spaces with single Tab to make the file consistent.

Script Component

In the Script Component add an output column of type DT_BOOL and name it outFlag also add a output column outColumn0 of type DT_STR and length equal to 4000 and select Column0 as Input Column.

enter image description here

Then write the following script in the Script Editor (C#):

First make sure that you add the RegularExpressions namespace

using System.Text.RegularExpressions;

Script Code

int SEOCount = 0;
int NOMCount = 0;

Regex regex = new Regex("[ ]{2,}", RegexOptions.None);


public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    if (Row.Column0.Trim().StartsWith("SEO"))
    {


        if (SEOCount == 0)
        {

            SEOCount++;
            Row.outFlag = true;

        }
        else
        {

            Row.outFlag = false;

        }



    }
    else if (Row.Column0.Trim().StartsWith("NOM"))
    {

        if (NOMCount == 0)
        {

            NOMCount++;
            Row.outFlag = true;

        }
        else
        {

            Row.outFlag = false;

        }

    }
    else if (Row.Column0.Trim().StartsWith("PAGE"))
    {
        Row.outFlag = false;
    }
    else
    {

        Row.outFlag = true;

    }


    Row.outColumn0 = regex.Replace(Row.Column0.TrimStart(), "\t");
}

Conditional Split

Add a conditional split after each Script Component and use the following expression to filter duplicate header:

[outFlag] == True

And connect the conditional split to the destination. Make Sure to map outColumn0 to the destination column.

enter image description here

Package link

https://www.dropbox.com/s/d936u4xo3mkzns8/Package.dtsx?dl=0

145

answered Sep 30 '22 13:09

Hadi

Related questions
                            
                                T-SQL Transaction syntax
                            
                                Microsoft ODBC Driver 13 for SQL Server : Login failed for user 'sa'
                            
                                Sql remove duplicates from comma separated string
                            
                                Left/outer join with linq on c# with where condition clause
                            
                                SQL Date between today and 7 days after
                            
                                Cannot find patterns in the input file while importing CSV into SQL Server
                            
                                not in operator in sql server not working
                            
                                Ms Sql Server Compare Numeric column with string value
                            
                                SQL Server : Using recursive CTE to resolve group membership
                            
                                Defeat these dashed dashes in SQL server
                            
                                output clause VS triggers
                            
                                How to solve error "Can not perform requested operation on nested resource. Parent resource not found" az sql db import throws
                            
                                Can not connect to SQL Server database hosted on localhost from Kubernetes, how can I debug this?
                            
                                What is this type of join called where the joins are qualified at the end?
                            
                                SQL server Where Clause variable may be null
                            
                                SQL Server in-memory oltp transaction snapshot isolation
                            
                                Configuring MSSQL Server on ubuntu - Cannot open or read the persistent registry: \SystemRoot\security.hiv
                            
                                Using SQL Server as Image store
                            
                                How do I kill connections to Azure SQL database if I can't access it?
                            
                                Check if SQL server (any version) is installed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to load mixed record type fixed width file with two headers into two separate files

Tags:

sql-server

csv

etl

ssis

flat-file

user6423884

People also ask

1 Answers

Ignore first 3 rows

Split file and remove bad rows

Experiments

Update 1 - Remove duplicates

Update 2 - Remove only duplicates headers + Replace spaces with Tab

Hadi

Recent Activity

Donate For Us