I've looked a a number of questions on this site and cannot find an answer to the question: How to create multiple NEW tables in a database (in my case I am using PostgreSQL) from multiple CSV source files, where the new database table columns accurately reflect the data within the CSV columns?
I can write the CREATE TABLE syntax just fine, and I can read the rows/values of a CSV file(s), but does a method already exist to inspect the CSV file(s) and accurately determine the column type? Before I build my own, I wanted to check if this already existed.
If it doesn't exist already, my idea would be to use Python, CSV module, and psycopg2 module to build a python script that would:
Does a tool like this already exist within either SQL, PostgreSQL, Python, or is there another application I should be be using to accomplish this (similar to pgAdmin3)?
To proceed, follow the below-mentioned steps: Step 1: First of all, start SQL Server Management Studio and connect to the database. Step 2: Next, under Object Explorer search for the database you want to export data in CSV. Step 3: Right-click on the desired database >> go to Tasks >> Export Data.
A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. The CSV file format is not fully standardized.
I have been dealing with something similar, and ended up writing my own module to sniff datatypes by inspecting the source file. There is some wisdom among all the naysayers, but there can also be reasons this is worth doing, particularly when we don't have any control of the input data format (e.g. working with government open data), so here are some things I learned in the process:
If you can avoid having to do automatic type detection it's worth avoiding it, but that's not always practical so I hope these tips are of some help.
It seems that you need to know the structure up front. Just read the first line to know how many columns you got.
CSV does not carry any type information, so it has to be deduced from the context of data.
Improving on the slightly wrong answer before, you can create a temporary table with x number of text columns, fill it up with data and then process the data.
BEGIN;
CREATE TEMPORARY TABLE foo(a TEXT, b TEXT, c TEXT, ...) ON COMMIT DROP;
COPY foo FROM 'file.csv' WITH CSV;
<do the work>
END;
Word of warning, the file needs to be accessible by the postgresql process itself. That creates some security issues. Other option is to feed it through STDIN.
HTH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With