Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Importing Large Volume of Data from CSV to Database

What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database.

  • The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.)
  • I need to do this in a efficiently as possible i.e. time constraints

I am siding with the option of reading, transforming and loading the data using a C# application row-by-row? Is this ideal, if not, what are my options? Should I use multithreading?

like image 1000
guazz Avatar asked Apr 14 '10 22:04

guazz


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.


2 Answers

You will be I/O bound, so multithreading will not necessarily make it run any faster.

Last time I did this, it was about a dozen lines of C#. In one thread it ran the hard disk as fast as it could read data from the platters. I read one line at a time from the source file.

If you're not keen on writing it yourself, you could try the FileHelpers libraries. You might also want to have a look at Sébastien Lorion's work. His CSV reader is written specifically to deal with performance issues.

like image 108
Robert Harvey Avatar answered Oct 27 '22 21:10

Robert Harvey


You could use the csvreader to quickly read the CSV.

Assuming you're using SQL Server, you use csvreader's CachedCsvReader to read the data into a DataTable which you can use with SqlBulkCopy to load into SQL Server.

like image 30
Jay Riggs Avatar answered Oct 27 '22 21:10

Jay Riggs