I am receiving an error when using
_adlsFileSystemClient.FileSystem.Create(_adlsAccountName, destFilePath, stream, overwrite)
to upload files to a datalake. The error comes up with files over 30Mb. It works fine with smaller files.
The error is:
at Microsoft.Azure.Management.DataLake.Store.FileSystemOperations.d__16.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.DataLake.Store.FileSystemOperationsExtensions.d__23.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.DataLake.Store.FileSystemOperationsExtensions.Create(IFileSystemOperations operations, String accountName, String directFilePath, Stream streamContents, Nullable
1 overwrite, Nullable
1 syncFlag) at AzureDataFunctions.DataLakeController.CreateFileInDataLake(String destFilePath, Stream stream, Boolean overwrite) in F:\GitHub\ZutoDW\ADF_ProcessAllFiles\ADF_ProcessAllFiles\DataLakeController.cs:line 122
Has anybody else encountered this? Or observed similar behaviour? I am getting around this by splitting my files into 30Mb pieces and uploading them.
However this is impractical in the long term because the original file is 380Mb, and potentially quite a bit larger. I do not want to have 10-15 dissected files in my datalake in the long term. I would like to upload as a single file.
I am able to upload the exact same file to the datalake through the portal interface.
Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores.
The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access. A common object store naming convention uses slashes in the name to mimic a hierarchical directory structure. This structure becomes real with Data Lake Storage Gen2.
Hadoop supports a set of file formats that are optimized for storing and processing structured data. Some common formats are Avro, Parquet, and Optimized Row Columnar (ORC) format. All of these formats are machine-readable binary file formats.
It answered here.
Currently there is a size limit of 30000000 bytes. You can work around by creating an initial file and then append, both with stream size less than the limit.
Please have a try to use DataLakeStoreUploader
to upload file or directory to DataLake, more demo code please refer to github sample. I test the demo and it works correctly for me. We can get the Microsoft.Azure.Management.DataLake.Store and Microsoft.Azure.Management.DataLake.StoreUploader SDK from the nuget. The following is my detail steps:
Add the following code
var applicationId = "your application Id";
var secretKey = "secret Key";
var tenantId = "Your tenantId";
var adlsAccountName = "adls account name";
var creds = ApplicationTokenProvider.LoginSilentAsync(tenantId, applicationId, secretKey).Result;
var adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
var inputFilePath = @"c:\tom\ForDemoCode.zip";
var targetStreamPath = "/mytempdir/ForDemoCode.zip"; //should be the '/foldername/' not the full path
var parameters = new UploadParameters(inputFilePath, targetStreamPath, adlsAccountName, isOverwrite: true,maxSegmentLength: 268435456*2); // the default maxSegmentLength is 256M, we can set by ourself.
var frontend = new DataLakeStoreFrontEndAdapter(adlsAccountName, adlsFileSystemClient);
var uploader = new DataLakeStoreUploader(parameters, frontend);
uploader.Execute();
Debug the application .
Check from the azure portal
SDK info please refer to the packages.config file
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="Microsoft.Azure.Management.DataLake.Store" version="1.0.2-preview" targetFramework="net452" />
<package id="Microsoft.Azure.Management.DataLake.StoreUploader" version="1.0.0-preview" targetFramework="net452" />
<package id="Microsoft.IdentityModel.Clients.ActiveDirectory" version="3.13.8" targetFramework="net452" />
<package id="Microsoft.Rest.ClientRuntime" version="2.3.2" targetFramework="net452" />
<package id="Microsoft.Rest.ClientRuntime.Azure" version="3.3.2" targetFramework="net452" />
<package id="Microsoft.Rest.ClientRuntime.Azure.Authentication" version="2.2.0-preview" targetFramework="net452" />
<package id="Newtonsoft.Json" version="9.0.2-beta1" targetFramework="net452" />
</packages>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With