Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write pandas dataframe into Databricks dbfs/FileStore?

enter image description hereenter image description hereI'm new to the Databricks, need help in writing a pandas dataframe into databricks local file system.

I did search in google but could not find any case similar to this, also tried the help guid provided by databricks (attached) but that did not work either. Attempted the below changes to find my luck, the commands goes just fine, but the file is not getting written in the directory (expected wrtdftodbfs.txt file gets created)

  1. df.to_csv("/dbfs/FileStore/NJ/wrtdftodbfs.txt")

Result: throws the below error

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/NJ/wrtdftodbfs.txt'

  1. df.to_csv("\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

  1. df.to_csv("dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

  1. df.to_csv(path ="\\dbfs\\FileStore\\NJ\\",file="wrtdftodbfs.txt")

Result: TypeError: to_csv() got an unexpected keyword argument 'path'

  1. df.to_csv("dbfs:\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

  1. df.to_csv("dbfs:\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

The directory exists and the files created manually shows up but pandas to_csv never writes nor error out.

dbutils.fs.put("/dbfs/FileStore/NJ/tst.txt","Testing file creation and existence")

dbutils.fs.ls("dbfs/FileStore/NJ")

Out[186]: [FileInfo(path='dbfs:/dbfs/FileStore/NJ/tst.txt', name='tst.txt', size=35)]

Appreciate your time and pardon me if the enclosed details are not clear enough.

like image 687
Shaan Proms Avatar asked Dec 19 '19 20:12

Shaan Proms


People also ask

Can you use pandas in Databricks?

This feature is available on clusters that run Databricks Runtime 10.0 (Unsupported) and above.

What is dBFS notebook in Databricks?

DBFS is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from. Step 1: File location and type Of note, this notebook is written in Python so the default cell type is Python.

How to use pandas Dataframe with Seaborn?

The pandas dataframe can now be used “normally”, here with seaborn: Now we have created a cluster, uploaded a csv file to Databricks and written a notebook that reads, transforms the data and then loads it back into Databricks file system. We also briefly looked at how to transform a PySpark dataframe to a pandas dataframe.

How do I use filestore in Databricks?

You can use FileStore to: Save files, such as images and libraries, that are accessible within HTML and JavaScript when you call displayHTML. Save output files that you want to download to your local desktop. Upload CSVs and other data files from your local desktop to process on Databricks.

How to download CSV file from Databricks?

Some of the CSV files have more than 1Million rows, so its not possible to download them directly. But here is a cool trick to download any file from Databricks filestore using displayHTML. Basically, it renders the HTML as output. It is assumed the file is stored inside /FileStore/.


2 Answers

Try with this in your notebook databricks:

import pandas as pd
from io import StringIO

data = """
CODE,L,PS
5d8A,N,P60490
5d8b,H,P80377
5d8C,O,P60491
"""

df = pd.read_csv(StringIO(data), sep=',')
#print(df)
df.to_csv('/dbfs/FileStore/NJ/file1.txt')

pandas_df = pd.read_csv("/dbfs/FileStore/NJ/file1.txt", header='infer') 
print(pandas_df)
like image 131
GiovaniSalazar Avatar answered Oct 11 '22 02:10

GiovaniSalazar


This worked out for me:

outname = 'pre-processed.csv'
outdir = '/dbfs/FileStore/'
dfPandas.to_csv(outdir+outname, index=False, encoding="utf-8")

To download the file, add files/filename to your notebook url (before the interrogation mark ?):

https://community.cloud.databricks.com/files/pre-processed.csv?o=189989883924552#

(you need to edit your home url, for me is :

https://community.cloud.databricks.com/?o=189989883924552#)

dbfs file explorer

like image 27
Nicoswow Avatar answered Oct 11 '22 02:10

Nicoswow