Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polybase CREATE EXTERNAL TABLE skip header

I am new to Azure and Polybase, I am trying to read a CSV file into a SQL External Table.

I noticed that, it is not possible to skip the first row, the header on some forums I read.

I'm hoping for the opposite,Can you help me ?

The code I used is below.

Thanks in advance

 CREATE EXTERNAL TABLE dbo.Test2External (
   [Guid] [varchar](36) NULL,
    [Year] [smallint] NULL,
    [SysNum] [bigint] NULL,
    [Crc_1] [decimal](15, 2) NULL,
    [Crc_2] [decimal](15, 2) NULL,
    [Crc_3] [decimal](15, 2) NULL,
    [Crc_4] [decimal](15, 2) NULL,
    [CreDate] [date] NULL,
    [CreTime] [datetime] NULL,
    [UpdDate] [date] NULL,
    ...
WITH (
    LOCATION='/20160823/1145/FIN/',
    DATA_SOURCE=AzureStorage,
    FILE_FORMAT=TextFile
);


-- Run a query on the external table

SELECT count(*) FROM dbo.Test2External;
like image 463
laurens Avatar asked Oct 04 '16 16:10

laurens


People also ask

What is the limitations of PolyBase?

PolyBase has the following limitations: Before SQL Server 2019 (15. x), the maximum possible row size, which includes the full length of variable length columns, can't exceed 32 KB in SQL Server or 1 MB in Azure Synapse Analytics.

Can we alter external table in SQL Server?

create-or-alter or . alter to modify existing tables. Altering the schema or format of an external SQL table is not supported.

Can we create external table in Azure SQL Database?

One of the ways to consume data online from Azure SQL Database on Managed Instance is to link the two by creating an external table, using CREATE EXTERNAL DATA SOURCE (Transact-SQL).


1 Answers

there is a workaround by using 'EXTERNAL FILE FORMAT' with 'FIRST_ROW = 2'. e.g. if we create a file format

CREATE EXTERNAL FILE FORMAT [CsvFormatWithHeader] WITH (
    FORMAT_TYPE = DELIMITEDTEXT, 
    FORMAT_OPTIONS (
        FIELD_TERMINATOR = ',', 
        FIRST_ROW  = 2,
        STRING_DELIMITER = '"',
        USE_TYPE_DEFAULT = False
        )
)
GO

And then use this file format with create external table

CREATE EXTERNAL TABLE [testdata].[testfile1]
(
    [column1] [nvarchar](4000) NULL
)
WITH (DATA_SOURCE = data_source,
LOCATION = file_location,
FILE_FORMAT = [CsvFormatWithHeader],REJECT_TYPE = PERCENTAGE,REJECT_VALUE = 100,REJECT_SAMPLE_VALUE = 1000)

It will skip first row while executing queries for 'testdata.testfile1'.

like image 156
Rizwan Hanif Avatar answered Oct 01 '22 19:10

Rizwan Hanif