How to read files with .xlsx and .xls extension in Azure data factory?

Tags:

I am trying to read and excel file in Azure Blob Storage with .xlsx extension in my azure data factory dataset. it throws following error

Error found when processing 'Csv/Tsv Format Text' source 'Filename.xlsx' with row number 3: found more columns than expected column count: 1.

What are the right Column and row delimiters for excel files to be read in azure Data factory

243

asked Sep 26 '18 09:09

vikas shivakumar

1 Answers

Update March 2022: ADF now has better support for Excel via Mapping Data Flows:

https://docs.microsoft.com/en-us/azure/data-factory/format-excel

Excel files have a proprietary format and are not simple delimited files. As indicated here, Azure Data Factory does not have a direct option to import Excel files, eg you cannot create a Linked Service to an Excel file and read it easily. Your options are:

Export or convert the data as flat files eg before transfer to cloud, as .csv, tab-delimited, pipe-delimited etc are easier to read than Excel files. This is your simplest option although obviously requires a change in process.
Try shredding the XML - create a custom task to open the Excel file as XML and extract your data as suggested here.
SSIS packages are now supported in Azure Data Factory (with the Execute SSIS package activity) and have better support for Excel files, eg a Connection Manager. So it may be an option to create an SSIS package to deal with the Excel and host it in ADFv2. Warning! I have not tested this, I am only speculating it is possible. Also there is the overhead of creating an Integration Runtime (IR) for running SSIS in ADFv2.
~~Try some other custom activity, eg there is a custom U-SQL Extractor for shredding XML on github here.~~
Try and read the Excel using Databricks, some examples here although spinning up a Spark cluster to read a few Excel files does seem somewhat overkill. This might be a good option if Spark is already in your architecture.

Let us know how you get on.

103

answered Nov 14 '22 21:11

wBob

Related questions
                            
                                How to keep a code running in VBA while a form window is open?
                            
                                Most efficient way to split a string up by a delimiter while ignoring certain instances of said delimiter using excel vba
                            
                                writing pandas data frame to existing workbook
                            
                                Excel weeknum function returns wrong week
                            
                                index match with wildcards and multiple criteria
                            
                                How to get the "Last Saved By" property for workbook file
                            
                                ExcelDataReader Datatype "date" gets transformed
                            
                                Putting dictionaries into classes
                            
                                Is there a range version of IsNumeric just like VBA function HasFormula?
                            
                                The weirdest VBA issue I have ever seen (VBASigned Possible Bug on Boolean Condition)
                            
                                How to deal with "Microsoft Excel is waiting for another application to complete an OLE action"
                            
                                Appending rows in excel xlswriter
                            
                                Excel cell default measure unit
                            
                                Xlsxwriter in pandas and outside of pandas lockout
                            
                                Loop through column, store values in an array
                            
                                Does Range.Formula property translate to other locale
                            
                                MemoryStream is empty for an OpenXML Excel document
                            
                                What exactly are the csv module's Dialect settings for excel-tab?
                            
                                Worksheet_Change Event not firing
                            
                                How to quickly count the number of lines in multiple text files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read files with .xlsx and .xls extension in Azure data factory?

Tags:

excel

azure

azure-data-factory

azure-data-factory-2

vikas shivakumar

People also ask

1 Answers

wBob

Recent Activity

Donate For Us