Fill SQL database from a CSV File

Tags:

I need to create a database using a CSV file with SSIS. The CSV file includes four columns:

enter image description here

I need to use the information of that table to populate the three tables I created in SQL below.

I have realized that what I need is to use one column of the Employee Table, EmployeeNumber, and Group Table, GroupID, to populate the EmployeeGroup table. For that, I thought that a Join Merge table is what I needed, but I created the Data Flow Task in SSIS, and the results are the same, no data displayed.

enter image description here

The middle table is the one used to relate the other tables.

I created the package in SSIS and the Employee and Group Tables are populated, but the EmployeeGroup table is not. EmployeeGroup will only show the EmployeeNumber and Group ID columns with no data.

I am new using SSIS, and I really do not know what else to do. I will really appreciate your help.

619

asked Dec 02 '16 18:12

HCavill

2 Answers

Overview

Solutions using SSIS
- Using 3 Data Flow Tasks
- Using 2 Data Flow Tasks
Solutions Using T-SQL
- Using Microsoft.Ace.OLEDB
- Using Microsoft Text Driver
Solutions Using PowerShell

1st Solution - SSIS

Using 3 Data Flow Tasks

This can be done using only 2 Data Flow Task, but according to what the OP mentioned in the question I am new using SSIS, and I really do not know what else to do, i will provide easiest solution which is 3 DataFlow Task to avoid using more components like MultiCast.

Solution Overview

Because you want to build a relational database and extract relations from the csv, you have to read the csv 3 times -consider it as 3 seperated files -.

First you have to import Employees and Groups Data, Then you have to import the relation table between them.

Each Import step can be done in a seperate Data Flow Task

Detailed Solution

Add a Flat File connection Manager (Csv File)
Add An OLEDB connection Manager (SQL Destination)
Add 3 DataFlow Task like the image below

enter image description here

First Data Flow Task

Add a Flat File Source , a Script Component , OLEDB destination like shown in the image below

enter image description here

In the Script Component choose Group Name column as Input

enter image description here

Select the Output Buffer and change SynchronousInputID Property to None And add an output column OutGroupname with type DT_STR

enter image description here

In the Script section write the following Code:

 Imports System.Collections.Generic

 Private m_List As New List(Of String)
 Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

If Not Row.GroupName_IsNull AndAlso
        Not String.IsNullOrEmpty(Row.GroupName.Trim) Then

    If Not m_List.Contains(Row.GroupName.Trim) Then

        m_List.Add(Row.GroupName.Trim)

        CreateOutputRows(Row.GroupName.Trim)

    End If


End If
End Sub

Public Sub CreateOutputRows(ByVal strValue As String)


Output0Buffer.AddRow()
Output0Buffer.OutGroupName = strValue
End Sub

On the OLEDB Destination map OutGroupName to GroupName Column

enter image description here

Second Data Flow Task : Import Employees Data

Repeat the same steps done with Groupname Column : with a single difference that is you have to choose the EmployeeID, Employee Name, LoginName columns as Input in the Script Component and Use the ID Column instead of Groupname column in the comparaison

Third Data Flow Task : Import Employees_Group Data

You have to add a Flat File Source , Look Up transformation , OLEDB Destination

enter image description here

In The LookUp Transformation Component select Groups Table as a Lookup table
Map GroupName Columns and Get Group ID as output

enter image description here

Choose Ignore Failure in the Error Output Configuration
In Oledb Destination map columns as following

enter image description here

Note: GroupID must be an Identity (set it in sql server)

Using 2 Data Flow Tasks

You have to do the same steps as the 3 Data Flow Tasks solution, but instead of adding 2 Data Flow Tasks to Group and Employee, just add one Data Flow Task, and after the Flat File Source add a MultiCast component to duplicate the Flow. Then for the first flow use the same Script Component and OLEDB Destination used in the Employee Data Flow Task, and for the second flow use the Script Component and OLEDB Destination related to Group.

2nd Solution - Using TSQL

There are many method to import Flat file to SQL via T-SQL commands

OPENROWSET with Microsoft ACE OLEDB provider

Assuming that the installed version of Microsoft ACE OLEDB is Microsoft.ACE.OLEDB.12.0 and that the csv file location is C:\abc.csv

First Import data into Employee and Group Table

INSERT INTO [GROUP]
    ([Group Name])
SELECT 
    [Group Name] 
FROM 
    OPENROWSET
        (
            'Microsoft.ACE.OLEDB.12.0','Text;Database=C:\;IMEX=1;','SELECT * FROM abc.csv'
        ) t


INSERT INTO [Employee]
    ([Employee Number],[Employee Name],[LoginName])
SELECT 
    [Employee Number],[Employee Name],[LoginName] 
FROM 
    OPENROWSET
        (
            'Microsoft.ACE.OLEDB.12.0','Text;Database=C:\;IMEX=1;','SELECT * FROM abc.csv'
        ) t

Import the Employee_Group Data

INSERT INTO [EmployeeGroup]
    ([Employee Number],[GroupID])
SELECT 
    t1.[Employee Number],t2.[GroupID]
FROM 
    OPENROWSET
        (
            'Microsoft.ACE.OLEDB.12.0','Text;Database=C:\;IMEX=1;','SELECT * FROM abc.csv'
        ) t1 INNER JOIN GROUP t2 ON t1.[Group Name] = T2.[Group Name]

OPENROWSET with Microsoft Text Driver

First Import data into Employee and Group Table

INSERT INTO [GROUP]
    ([Group Name])
SELECT 
    [Group Name] 
FROM 
    OPENROWSET
        (
            'MSDASQL',
            'Driver={Microsoft Text Driver (*.txt; *.csv)};
            DefaultDir=C:\;',
            'SELECT * FROM abc.csv'
        ) t


INSERT INTO [Employee]
    ([Employee Number],[Employee Name],[LoginName])
SELECT 
    [Employee Number],[Employee Name],[LoginName] 
FROM 
    OPENROWSET
        (
            'MSDASQL',
            'Driver={Microsoft Text Driver (*.txt; *.csv)};
            DefaultDir=C:\;',
            'SELECT * FROM abc.csv'
        ) t

Import the Employee_Group Data

INSERT INTO [EmployeeGroup]
    ([Employee Number],[GroupID])
SELECT 
    t1.[Employee Number],t2.[GroupID]
FROM 
    OPENROWSET
        (
            'MSDASQL',
            'Driver={Microsoft Text Driver (*.txt; *.csv)};
            DefaultDir=C:\;',
            'SELECT * FROM abc.csv'
        ) t1 INNER JOIN GROUP t2 ON t1.[Group Name] = T2.[Group Name]

Note: You can Import Data to a staging table, then query this table, to avoid connecting many times to the csv File

Solutions Using PowerShell

There are many method to import csv files to SQL server, you can check the following links for additional informations.

Four Easy Ways to Import CSV Files to SQL Server with PowerShell
How to import data from .csv in SQL Server using PowerShell?

References

OPENROWSET (Transact-SQL)
T-SQL – Read CSV files using OpenRowSet
Import error using Openrowset

138

answered Sep 29 '22 12:09

Hadi

I think the easiest solution would be to import the csv to a flat staging table and then use some insert into...select statements to populate the target tables. Assuming you know how to import to a flat table, the rest is quite simple:

INSERT INTO Employee (EmployeeNumber, EmployeeName, LoginName)
SELECT DISTINCT EmployeeNumber, EmployeeName, LoginName
FROM Stage

INSERT INTO [Group] (GroupName)
SELECT DISTINCT GroupName 
FROM Stage

INSERT INTO EmployeeGroup(EmployeeNumber, GroupId)
SELECT DISTINCT EmployeeNumber, GroupId
FROM Stage s
INNER JOIN [Group] g ON s.GroupName = g.GroupName

You can see a live demo on rextester.

answered Sep 29 '22 11:09

Zohar Peled

Related questions
                            
                                Why did SQL Server suddenly decide to use such a terrible execution plan?
                            
                                Remove duplicate data from query results
                            
                                SSIS PrimeOutput Error?
                            
                                selecting the Row of table Except the First one
                            
                                How do Visual Studio 2013 Database Projects work with TFS online and EntityFramework code first migrations
                            
                                Using TransactionScope around a stored procedure with transaction in SQL Server 2014
                            
                                Select record between two IP ranges
                            
                                Why does a SQL join choose a sub-optimal query plan?
                            
                                SQL Server 2008 - Login failed. The login is from an untrusted domain and cannot be used with Windows authentication
                            
                                Flattening intersecting timespans
                            
                                the best way to connect sql server (Windows authentication vs SQL Server authentication) for asp.net app
                            
                                Max length of SQL Server instance name?
                            
                                How to free up memory used by idle SQL Server databases?
                            
                                SQL server query processor ran out of internal resources
                            
                                Create @TableVariable based on an existing database table?
                            
                                Determine MAX Decimal Scale Used on a Column
                            
                                INSTEAD OF TRIGGER, Would it infinitely loop?
                            
                                How does using TRUNCATE TABLE affect Indexes
                            
                                Can we write case statement without having else statement
                            
                                How do I insert data when the primary key column is not an identity column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fill SQL database from a CSV File

Tags:

sql-server

csv

etl

ssis

flat-file

HCavill

People also ask

2 Answers

Overview

1st Solution - SSIS

Using 3 Data Flow Tasks

Solution Overview

Detailed Solution

First Data Flow Task

Second Data Flow Task : Import Employees Data

Third Data Flow Task : Import Employees_Group Data

Using 2 Data Flow Tasks

2nd Solution - Using TSQL

OPENROWSET with Microsoft ACE OLEDB provider

OPENROWSET with Microsoft Text Driver

Solutions Using PowerShell

References

Hadi

Zohar Peled

Recent Activity

Donate For Us