Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks: Download a dbfs:/FileStore File to my Local Machine?

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result.

I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my local machine.

I have tried to use cURL, but I can't find the RestAPI command to download a dbfs:/FileStore file.

Question: How can I download a dbfs:/FileStore file to my Local Machine?

I am using Databricks Community Edition to teach an undergraduate module in Big Data Analytics in college. I have Windows 7 installed in my local machine. I have checked that cURL and the _netrc files are properly installed and configured as I manage to successfully run some of the commands provided by the RestAPI.

Thank you very much in advance for your help! Best regards, Nacho

like image 975
Nacho Castiñeiras Avatar asked Feb 27 '18 23:02

Nacho Castiñeiras


People also ask

How do I download files from Databricks Dbfs to local?

Install the Databricks CLI, configure it with your Databricks credentials, and use the CLI's dbfs cp command. For example: dbfs cp dbfs:/FileStore/test. txt ./test.

How do I download a Dbfs jar?

Note: Copy the generated token and store in a secure location. Step3: Open DBFS explorer for Databricks and Enter Host URL and Bearer Token and continue. Step4: Navigate to the DBFS folder named FileStore => jars => Select the jar which you want to download and click download and select the folder on the local machine.

How do I access Dbfs FileStore?

You can access DBFS objects using the DBFS CLI, DBFS API, Databricks file system utilities (dbutils. fs), Spark APIs, and local file APIs. In a Spark cluster you access DBFS objects using Databricks file system utilities, Spark APIs, or local file APIs.


1 Answers

There are a few options for downloading FileStore files to your local machine.

Easier options:

  • Install the Databricks CLI, configure it with your Databricks credentials, and use the CLI's dbfs cp command. For example: dbfs cp dbfs:/FileStore/test.txt ./test.txt. If you want to download an entire folder of files, you can use dbfs cp -r.
  • From a browser signed into Databricks, navigate to https://<YOUR_DATABRICKS_INSTANCE_NAME>.cloud.databricks.com/files/. If you are using Databricks Community Edition then you may need to use a slightly different path. This download method described in more detail in the FileStore docs.

Advanced options:

  • Use the DBFS REST API. You can access file contents using the read API call. To download a large file, you may need to issue multiple read calls to access chunks of the full file.
like image 180
Josh Rosen Avatar answered Oct 20 '22 05:10

Josh Rosen