I have a tricky flat file data source. The data is grouped, like this: <pre class="prettyprint"><code>Country City U.S. New York Washington Baltimore Canada Toronto Vancouver </code></pre> But I want it to be this format when it's loaded in to the database: <pre class="prettyprint"><code>Country City U.S. New York U.S. Washington U.S. Baltimore Canada Toronto Canada Vancouver </code></pre> Anyone has met such a problem before? Got a idea to deal with it? The only idea I got now is to use the cursor, but the it is just too slow. Thank you!

Yes, it is possible. First you need to load the data to a table with an IDENTITY column: <pre class="prettyprint"><code>-- drop table #t CREATE TABLE #t (id INTEGER IDENTITY PRIMARY KEY, Country VARCHAR(20), City VARCHAR(20)) INSERT INTO #t(Country, City) SELECT a.Country, a.City FROM OPENROWSET( BULK 'c:\import.txt', FORMATFILE = 'c:\format.fmt', FIRSTROW = 2) AS a; select * from #t </code></pre> The result will be: <pre class="prettyprint"><code>id Country City ----------- -------------------- -------------------- 1 U.S. New York 2 Washington 3 Baltimore 4 Canada Toronto 5 Vancouver </code></pre> And now with a bit of recursive CTE magic you can populate the missing details: <pre class="prettyprint"><code>;WITH a as( SELECT Country ,City ,ID FROM #t WHERE ID = 1 UNION ALL SELECT COALESCE(NULLIF(LTrim(#t.Country), ''),a.Country) ,#t.City ,#t.ID FROM a INNER JOIN #t ON a.ID+1 = #t.ID ) SELECT * FROM a OPTION (MAXRECURSION 0) </code></pre> Result: <pre class="prettyprint"><code>Country City ID -------------------- -------------------- ----------- U.S. New York 1 U.S. Washington 2 U.S. Baltimore 3 Canada Toronto 4 Canada Vancouver 5 </code></pre> Update: As Tab Alleman suggested below the same result can be achieved without the recursive query: <pre class="prettyprint"><code>SELECT ID , COALESCE(NULLIF(LTrim(a.Country), ''), (SELECT TOP 1 Country FROM #t t WHERE t.ID < a.ID AND LTrim(t.Country) <> '' ORDER BY t.ID DESC)) , City FROM #t a </code></pre> BTW, the format file for your input data is this (if you want to try the scripts save the input data as c:\import.txt and the format file below as c:\format.fmt): <pre class="prettyprint"><code>9.0 2 1 SQLCHAR 0 11 "" 1 Country SQL_Latin1_General_CP1_CI_AS 2 SQLCHAR 0 100 "\r\n" 2 City SQL_Latin1_General_CP1_CI_AS </code></pre>

How to load grouped data with SSIS

Tags:

sql

sql-server

tsql

ssis

ssis-2012

I have a tricky flat file data source. The data is grouped, like this:

Country    City
U.S.       New York
           Washington
           Baltimore
Canada     Toronto
           Vancouver

But I want it to be this format when it's loaded in to the database:

Country    City
U.S.       New York
U.S.       Washington
U.S.       Baltimore
Canada     Toronto
Canada     Vancouver

Anyone has met such a problem before? Got a idea to deal with it?
The only idea I got now is to use the cursor, but the it is just too slow.
Thank you!

644

asked Apr 13 '16 02:04

William Xu

2 Answers

The answer by cha will work, but here is another in case you need to do it in SSIS without temporary/staging tables:

You can run your dataflow through a Script Transformation that uses a DataFlow-level variable. As each row comes in the script checks the value of the Country column.

If it has a non-blank value, then populate the variable with that value, and pass it along in the dataflow.

If Country has a blank value, then overwrite it with the value of the variable, which will be last non-blank Country value you got.

EDIT: I looked up your error message and learned something new about Script Components (the Data Flow tool, as opposed to Script Tasks, the Control Flow tool):

The collection of ReadWriteVariables is only available in the PostExecute method to maximize performance and minimize the risk of locking conflicts. Therefore you cannot directly increment the value of a package variable as you process each row of data. Increment the value of a local variable instead, and set the value of the package variable to the value of the local variable in the PostExecute method after all data has been processed. You can also use the VariableDispenser property to work around this limitation, as described later in this topic. However, writing directly to a package variable as each row is processed will negatively impact performance and increase the risk of locking conflicts.

That comes from this MSDN article, which also has more information about the Variable Dispenser work-around, if you want to go that route, but apparently I mislead you above when I said you can set the value of the package variable in the script. You have to use a variable that is local to the script, and then change it in the Post-Execute event handler. I can't tell from the article whether that means that you will not be able to read the variable in the script, and if that's the case, then the Variable Dispenser would be the only option. Or I suppose you could create another variable that the script will have read-only access to, and set its value to an expression so that it always has the value of the read-write variable. That might work.

answered Oct 02 '22 05:10

Tab Alleman

Yes, it is possible. First you need to load the data to a table with an IDENTITY column:

-- drop table #t
CREATE TABLE #t (id INTEGER IDENTITY PRIMARY KEY,
Country VARCHAR(20),
City VARCHAR(20))

INSERT INTO #t(Country, City)
SELECT a.Country, a.City
 FROM OPENROWSET( BULK 'c:\import.txt', 
     FORMATFILE = 'c:\format.fmt',
     FIRSTROW = 2) AS a;

select * from #t

The result will be:

id          Country              City
----------- -------------------- --------------------
1           U.S.                 New York
2                                Washington
3                                Baltimore
4           Canada               Toronto
5                                Vancouver

And now with a bit of recursive CTE magic you can populate the missing details:

;WITH a as(
    SELECT Country
          ,City
          ,ID
    FROM #t WHERE ID = 1
    UNION ALL
    SELECT COALESCE(NULLIF(LTrim(#t.Country), ''),a.Country)
          ,#t.City
          ,#t.ID
    FROM a INNER JOIN #t ON a.ID+1 = #t.ID
    )
SELECT * FROM a
 OPTION (MAXRECURSION 0)

Result:

Country              City                 ID
-------------------- -------------------- -----------
U.S.                 New York             1
U.S.                 Washington           2
U.S.                 Baltimore            3
Canada               Toronto              4
Canada               Vancouver            5

Update:

As Tab Alleman suggested below the same result can be achieved without the recursive query:

SELECT ID
     , COALESCE(NULLIF(LTrim(a.Country), ''), (SELECT TOP 1 Country FROM #t t WHERE t.ID < a.ID AND LTrim(t.Country) <> '' ORDER BY t.ID DESC))
     , City
FROM #t a

BTW, the format file for your input data is this (if you want to try the scripts save the input data as c:\import.txt and the format file below as c:\format.fmt):

9.0
  2
  1       SQLCHAR       0       11      ""       1     Country      SQL_Latin1_General_CP1_CI_AS
  2       SQLCHAR       0       100     "\r\n"   2     City         SQL_Latin1_General_CP1_CI_AS

answered Oct 02 '22 04:10

cha

Related questions
                            
                                How to send and receive parameters to/from SQL Server stored procedure
                            
                                How to join two recordset created from two different data source in excel vba
                            
                                Writing SQL Query to query and update same table
                            
                                SQLite SUM() between several rows
                            
                                Is there a performance improvement when using JOIN vs a WHERE clause? [duplicate]
                            
                                Can I use MERGE INTO to simulate "upsert" in Apache Derby?
                            
                                Exclude matched array elements
                            
                                Passing Nullable Integer (int? SomeValue = NULL) As Parameter To Stored Procedure
                            
                                Row_number() in postgresql
                            
                                insert unique values to postgresql
                            
                                Entity Framework join query with int array
                            
                                SQLZOO SELECT from nobel #14
                            
                                Mysql group_concat of repeated keys and count of repetition of multiple columns in 1 query ( Query Optimization )
                            
                                How to display 2 digits after dot in PostgreSQL?
                            
                                MySQL query to find number of unique new visitors per week
                            
                                PostgreSQL: insert string in a large object from an SQL script without relying on an external file
                            
                                Multiple Full Outer Joins
                            
                                PHP Date time format in SQL query
                            
                                Does "Join" order will lead to different query performance [duplicate]
                            
                                SQL Argument data type int is invalid for argument 1 of charindex function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With