Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommended way to read data from an Excel file using an Azure App Service?

Background
I have a legacy site that allows authorised users to upload an Excel spreadsheet of product data etc. The site then reads the Excel worksheets and unpacks the data into SQL server. It's an old site and it uses OLE. Old but it works.

The problem
I've recently published the site to an Azure App Service but the existing section of my code that reads from Excel doesn't work (as Azure doesn't have the right driver).

The question
I'm happy to rewrite this section of code but what is the CORRECT or RECOMMENDED approach for reading from Excel using an Azure App Service? I'm not asking about ways that MIGHT work I'm only interested in the RIGHT way to do this.

By "recommendeded" I mean:

  • not unnecessarily complicated. Keep it simple.
  • likely to retain support from Microsoft in the future

I have researched this issue but have not been able to find a clear statement of the best way to do this. If you have experience or knowledge of different ways of doing this I'd be grateful if you could share your conclusion about the BEST way to do this.

like image 991
Mark Challice Avatar asked Nov 07 '18 12:11

Mark Challice


People also ask

How does Azure read Excel files?

In Azure SQL Database, you cannot import directly from Excel. You must first export the data to a text (CSV) file. Before you can run a distributed query, you have to enable the ad hoc distributed queries server configuration option, as shown in the following example.

Can Azure data Factory read Excel file?

The service supports both ". xls" and ". xlsx". Excel format is supported for the following connectors: Amazon S3, Amazon S3 Compatible Storage, Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure Files, File System, FTP, Google Cloud Storage, HDFS, HTTP, Oracle Cloud Storage and SFTP.


1 Answers

There should be many ways you can achieve this, and here I list 2 as below:

1.Use the DocumentFormat.OpenXml, which is published by MS, but it's a little complicated. The demo code is here.

2.Use ExcelDataReader, which is very simple and supports both .xls and .xlsx. You can refer to this article to do it(note that IsFirstRowAsColumnNames property is abandoned, you can see my code below for this change).

And I write a demo with the 2nd method ExcelDataReader.Just for test purpose, I uploaded the excel to the azure web app directory like below:

And the following is the excel content:

Step 1: Create an asp.net MVC project, and then install the latest version ExcelDataReader and ExcelDataReader.DataSet via nuget package manager.

Step 2: Create a ExcelData.cs file in your project which used to read excel file:

Step 3: Write the following code in ExcelData.cs:

using ExcelDataReader;
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;

namespace WebApplication42
{
    public class ExcelData
    {
        string _path;
        public ExcelData(string path)
        {
            _path = path;
        }

        public IExcelDataReader GetExcelReader()
        {
            FileStream stream = File.Open(_path, FileMode.Open, FileAccess.Read);
            IExcelDataReader reader = null;
            try
            {
                if (_path.EndsWith(".xls"))
                {
                    reader = ExcelReaderFactory.CreateBinaryReader(stream);
                }
                if (_path.EndsWith(".xlsx"))
                {
                    reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
                }

                return reader;
            }
            catch (Exception)
            {
                throw;
            }
        }

        //read the sheets name if you need
        public IEnumerable<string> GetWorksheetNames()
        {
            var reader = this.GetExcelReader();
            var workbook = reader.AsDataSet();
            var sheets = from DataTable sheet in workbook.Tables select sheet.TableName;
            return sheets;
        }

        //read data in a specified sheet
        public IEnumerable<DataRow> GetData(string sheet)
        {

            var reader = this.GetExcelReader();
            var workSheet = reader.AsDataSet(new ExcelDataSetConfiguration()
            {
                ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
                {
                    //indicates if use the header values
                    UseHeaderRow = true
                }

            }).Tables[sheet];

            var rows = from DataRow a in workSheet.Rows select a;
            return rows;
        }    

    }
}

Step 4: In the controller, call the read excel method:

        public ActionResult Excels()
        {
            ViewBag.Message = "the data from excel:";
            string data = "";

            //your excel path after uploaded, here I hardcoded it for test only
            string path = @"D:\home\site\wwwroot\Files\ddd.xls";
            var excelData = new ExcelData(path);
            var people = excelData.GetData("sheet1");

            foreach (var p in people)
            {
                for (int i=0;i<=p.ItemArray.GetUpperBound(0);i++)
                {
                    data += p[i].ToString()+",";
                }

                data += ";";
            }

            ViewBag.Message += data;

            return View();
        }

Step 5: After publish to azure, launch the site and see the results-> all the data in excel are read:

like image 83
Ivan Yang Avatar answered Oct 22 '22 01:10

Ivan Yang