Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

30Mb limit uploading to Azure DataLake using DataLakeStoreFileSystemManagementClient

I am receiving an error when using

_adlsFileSystemClient.FileSystem.Create(_adlsAccountName, destFilePath, stream, overwrite)

to upload files to a datalake. The error comes up with files over 30Mb. It works fine with smaller files.

The error is:

at Microsoft.Azure.Management.DataLake.Store.FileSystemOperations.d__16.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.DataLake.Store.FileSystemOperationsExtensions.d__23.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.DataLake.Store.FileSystemOperationsExtensions.Create(IFileSystemOperations operations, String accountName, String directFilePath, Stream streamContents, Nullable1 overwrite, Nullable1 syncFlag) at AzureDataFunctions.DataLakeController.CreateFileInDataLake(String destFilePath, Stream stream, Boolean overwrite) in F:\GitHub\ZutoDW\ADF_ProcessAllFiles\ADF_ProcessAllFiles\DataLakeController.cs:line 122

Has anybody else encountered this? Or observed similar behaviour? I am getting around this by splitting my files into 30Mb pieces and uploading them.

However this is impractical in the long term because the original file is 380Mb, and potentially quite a bit larger. I do not want to have 10-15 dissected files in my datalake in the long term. I would like to upload as a single file.

I am able to upload the exact same file to the datalake through the portal interface.

like image 641
Tom Armstrong Avatar asked Jan 04 '17 12:01

Tom Armstrong


People also ask

What is the storage capacity of Azure Data Lake store?

Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores.

What is the file structure for Azure Data lake storage?

The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access. A common object store naming convention uses slashes in the name to mimic a hierarchical directory structure. This structure becomes real with Data Lake Storage Gen2.

What format of data can be stored in Adls Gen2?

Hadoop supports a set of file formats that are optimized for storing and processing structured data. Some common formats are Avro, Parquet, and Optimized Row Columnar (ORC) format. All of these formats are machine-readable binary file formats.


2 Answers

It answered here.

Currently there is a size limit of 30000000 bytes. You can work around by creating an initial file and then append, both with stream size less than the limit.

like image 56
jason chen Avatar answered Sep 25 '22 02:09

jason chen


Please have a try to use DataLakeStoreUploader to upload file or directory to DataLake, more demo code please refer to github sample. I test the demo and it works correctly for me. We can get the Microsoft.Azure.Management.DataLake.Store and Microsoft.Azure.Management.DataLake.StoreUploader SDK from the nuget. The following is my detail steps:

  1. Create a C# console application
  2. Add the following code

     var applicationId = "your application Id";
     var secretKey = "secret Key";
     var tenantId = "Your tenantId";
     var adlsAccountName = "adls account name";
     var creds = ApplicationTokenProvider.LoginSilentAsync(tenantId, applicationId, secretKey).Result;
     var adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
     var inputFilePath = @"c:\tom\ForDemoCode.zip";
     var targetStreamPath = "/mytempdir/ForDemoCode.zip";  //should be the '/foldername/' not the full path
     var parameters = new UploadParameters(inputFilePath, targetStreamPath, adlsAccountName, isOverwrite: true,maxSegmentLength: 268435456*2); // the default  maxSegmentLength is 256M, we can set by ourself.
     var frontend = new DataLakeStoreFrontEndAdapter(adlsAccountName, adlsFileSystemClient);
     var uploader = new DataLakeStoreUploader(parameters, frontend);
     uploader.Execute();
    
  3. Debug the application .

    enter image description here

  4. Check from the azure portal

enter image description here

SDK info please refer to the packages.config file

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Azure.Management.DataLake.Store" version="1.0.2-preview" targetFramework="net452" />
  <package id="Microsoft.Azure.Management.DataLake.StoreUploader" version="1.0.0-preview" targetFramework="net452" />
  <package id="Microsoft.IdentityModel.Clients.ActiveDirectory" version="3.13.8" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime" version="2.3.2" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime.Azure" version="3.3.2" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime.Azure.Authentication" version="2.2.0-preview" targetFramework="net452" />
  <package id="Newtonsoft.Json" version="9.0.2-beta1" targetFramework="net452" />
</packages>
like image 43
Tom Sun - MSFT Avatar answered Sep 24 '22 02:09

Tom Sun - MSFT