Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving Matplotlib Output to DBFS on Databricks

I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS.

Code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
# plt.show()

Things that I tried:

plt.savefig("/FileStore/my-file.png")

[Errno 2] No such file or directory: '/FileStore/my-file.png'

fig = plt.gcf()
dbutils.fs.put("/dbfs/FileStore/my-file.png", fig)

TypeError: has the wrong type - (,) is expected.

After some research, I think the fs.put only works if you want to save text files.

running the above code with plt.show() will get you a bar graph - I want to be able to save the bar graph as an image to DBFS. Any help is appreciated, thanks in advance!

like image 939
KikiNeko Avatar asked Jul 25 '19 14:07

KikiNeko


2 Answers

Easier way, just with matplotlib.pyplot. Fix the dbfs path:

Example

import matplotlib.pyplot as plt
plt.scatter(x=[1,2,3], y=[2,4,3])
plt.savefig('/dbfs/FileStore/figure.png')
like image 191
mangelfdz Avatar answered Oct 26 '22 21:10

mangelfdz


You can do this by saving the figure to memory and then using the Python local file APIs to write to the DataBricks filesystem (DBFS).

Example:

import matplotlib.pyplot as plt
from io import BytesIO

# Create a plt or fig, then:
buf = BytesIO()
plt.savefig(buf, format='png')

path = '/dbfs/databricks/path/to/file.png'

# Make sure to open the file in bytes mode
with open(path, 'wb') as f:
  # You can also use Bytes.IO.seek(0) then BytesIO.read()
  f.write(buf.getvalue())
like image 22
Alex Ross Avatar answered Oct 26 '22 21:10

Alex Ross