Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks Prints Only Around 280 lines of data

I'm running some large jobs in Databricks, which for now, include inventorying the data lake. I'm trying to print all blob names within a prefix (sub-folder). There are a lot of files in these sub-folders, and I'm getting about 280 rows of file names printed, but then I see this: *** WARNING: skipped 494256 bytes of output *** Then, I get another 280 rows printed.

I'm guessing there is a control to change this, right. I certainly hope so. This is designed to work with BIG data, not ~280 records. I understand that huge data sets can easily crash a browser, but common, this is basically nothing.

like image 839
ASH Avatar asked Oct 16 '22 10:10

ASH


1 Answers

Note: Using GUI, you can download full results (max 1 millions rows).

enter image description here

To download full results (more than 1 million), first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.

dbfs cp "dbfs:/FileStore/tables/AA.csv" "A:\AzureAnalytics"

Reference: Databricks file system

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

Reference: Installing and configuring Azure Databricks CLI

Hope this helps.

like image 160
CHEEKATLAPRADEEP-MSFT Avatar answered Oct 19 '22 01:10

CHEEKATLAPRADEEP-MSFT