Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform xml data using datafactory pipeline

How do we save data inside of an XML payload to blob storage?

input

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>sdfsdfjijOIJOISJDFQPWORPJkjsdlfkjlksdf==</bytes>
</root>

desired result

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>/blob/path/toSavedPayload</bytes>
</root>
  1. save bytes somewhere in blob
  2. replace bytes with URI of where bytes were saved

How do we use data factory to extract a node from XML and save it to blob?

like image 723
Alex Gordon Avatar asked May 20 '19 15:05

Alex Gordon


People also ask

Does ADF support XML?

XML format is supported for the following connectors: Amazon S3, Amazon S3 Compatible Storage, Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure Files, File System, FTP, Google Cloud Storage, HDFS, HTTP, Oracle Cloud Storage and SFTP. It is supported as source but not sink.

How do I connect Datafactory to Databricks?

Create linked servicesOn the home page, switch to the Manage tab in the left panel. Select Linked services under Connections, and then select + New. In the New linked service window, select Compute > Azure Databricks, and then select Continue.

What is dataflow in Datafactory?

Data flows are one of the features inside the Azure Data Factory which allows data engineers to develop data transformation logic in a graphical approach without writing code. The resulting data flows can then be executed as activities within Azure Data Factory pipelines that use scaled-out Spark clusters.


1 Answers

As Azure Data Factory does not support XML natively, I would suggest you to go for SSIS package.

  1. In the Data flow task, have XML source and read bytes from the xml into a variable of DT_Image datatype.
  2. Create a script task, which uploads the byte array (DT_Image) got in step no.1 to azure blob storage as mentioned in the below. Code slightly modified for the requirement. Reference of SO post
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;    

// Retrieve storage account from connection string.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse("StorageKey");

// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");

// Retrieve reference to a blob named "myblob".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob");

byte[] byteArrayIn = Dts.Variables["User::ImageVariable"].Value;

// Create or overwrite the "myblob" blob with contents from a local file.
using (var memoryStream = new MemoryStream(byteArrayIn);)
{
    blockBlob.UploadFromStream(memoryStream);
}
  1. Now, host this SSIS Package in SSIS Runtime in Azure Data Factory and execute the SSIS package.

SSIS Runtime in Azure DataFactory

like image 137
Venkataraman R Avatar answered Nov 04 '22 10:11

Venkataraman R