I am using openXLSX
package to generate excel files from my R
output.
There is no way I have found to add excel charts to the excel workbook.
I saw that python
has a module for creating Excel files that has a class for adding excel charts.
Is there a way to do this with R?
Just because you have an xlsx file doesn't mean you can't analyze it with R. Likewise, you can write R code to populate and manipulate Excel spreadsheets. Today you'll get a grasp of two tools used to manage communication between R and Excel. These are readxl and Bert.
R can be used to explore, clean, analyze and visualize data. The following tutorial will get you started using R's ggplot2 package to make a simple line chart from a csv of data.
Here's a solution using package XLConnect
.
A small note though, it relies on templates of charts, which you need to create in advance, and it generates new files instead of appending sheets or charts to existing files.
It is comprised of two stages:
First step: prepare templates in excel, according to the types of charts you will need. You can have all the templates on the same file (in different sheets) or in several different files. When you prepare the templates, include the type of charts you need within the sheet, but instead of referring to specific cells, you need to use "named ranges".
See for example. You can also use the sample file I created. Note the use of named ranges in the file and in the chart's data references (as Sheet1!bar_names
and Sheet1!values
instead of Sheet1!$A$2:$A$4
and Sheet1!$B$2:$B$4
).
A side note on named ranges in Excel. Named ranges mean you give names to the data you're going to use in the chart, and then you "tell the chart" to use the named range, instead of an absolute location. You can access the "Name Manager" in excel in the "Formulas" menu. The reason we are using named ranges is that XLConnect
is able to control named ranges, hence the chart will dynamically update when we modify the named range.
Second step: use an adaptation of the following code such that it suits your needs. Mostly use your own data frame and update the reference in the createName
function.
library(XLConnect) # load library
wb1 <- loadWorkbook(filename = "edit_chart_via_R_to_excel.xlsx")
new.df <- data.frame(Type = c("Ford", "Hyundai", "BMW", "Other"),
Number = c(45, 35, 25, 15)) # sample data
writeWorksheet(wb1, data = new.df, sheet = "Sheet1",
startRow = 1, startCol = 1, header = TRUE)
# update named ranges for the chart's use.
# Note that
# "Sheet1!$A$2:$A$5" and "Sheet1!$B$2:$B$5"
# should change according to the data you are updating
createName(wb1, "bar_names", "Sheet1!$A$2:$A$5", overwrite = TRUE)
createName(wb1, "values", "Sheet1!$B$2:$B$5", overwrite = TRUE)
saveWorkbook(wb1)
This should do the trick.
Note that you can copy and save the template before you start the modifications, if you want to provide it as a new file (and to preserve the original template without overwriting it).
I looked into using reticulate
to write the .xlsx file from scratch, with a native excel chart based on the data, avoiding having to make a template. The script below generates some data, saves it to the .xlsx file, then builds a line chart below the data. See the documentation at https://xlsxwriter.readthedocs.io/chart.html for different chart types!
Note also that this shoudl prompt you to install Python if reticulate
can't find an existing installation.
The code is available at the following gist: https://gist.github.com/jsavn/cbea4b35d73cea6841489e72a221c4e9
write_xlsx_and_chart_to_file.py
(The filename of this is used in the source()
call in the R script later)
import pandas as pd
import xlsxwriter as xw
# The skeleton of below function based on example from: https://xlsxwriter.readthedocs.io/example_pandas_chart.html#ex-pandas-chart
# We pass the function a pandas dataframe;
# The dataframe is inserted in an .xslx spreadsheet
# We take note of the number of rows and columns, and use those to position the chart below the data
# We then iterate over the rows of the data and insert each row as a separate line (series) in the line chart
def save_time_series_as_xlsx_with_chart(pandas_df, filename):
if not(filename.endswith('.xlsx')):
print("Warning: added .xlsx to filename")
filename = filename + '.xlsx'
# Create a Pandas dataframe from the data.
# pandas_df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
## get dimensions of data frame to use for positioning the chart later
pandas_df_nrow, pandas_df_ncol = pandas_df.shape
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
pandas_df.to_excel(writer, sheet_name='Sheet1', index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Create a chart object.
chart = workbook.add_chart({'type': 'line'})
# Configure the series of the chart from the dataframe data
# THe coordinates of each series in the line chart are the positions of the data in the excel file
# Note that data starts at row 2, column 1, so the row/col values need to be adjusted accordingly
# However, python counts rows & columns from 0
for row_in_data in range(0,pandas_df_nrow):
row_in_sheet = row_in_data+1 # data starts on 2nd row
last_col_in_sheet = pandas_df_ncol-1 # number of columns minus one in 0-notation
first_col_with_data = 1 # 2nd column in 0-notation
range_of_series = xw.utility.xl_range(
first_row=row_in_sheet, # read from the current row in loop only
first_col=first_col_with_data, # data starts in 2nd column, i.e. 1 in 0-notation
last_row=row_in_sheet,
last_col=last_col_in_sheet
)
range_of_categories = xw.utility.xl_range(
first_row=0, # read from 1st row only - header
first_col=first_col_with_data, # read from 2nd column for month headers
last_row=0,
last_col=last_col_in_sheet
)
formula_for_series = '=Sheet1!' + range_of_series
col_with_series_name = 0 # first column
name_of_series = '=Sheet1!' + xw.utility.xl_rowcol_to_cell(row=row_in_sheet, col=col_with_series_name)
formula_for_categories = 'Sheet1!' + range_of_categories
chart.add_series({'values': formula_for_series, 'name': name_of_series, 'categories': formula_for_categories})
# Insert the chart into the worksheet.
worksheet.insert_chart(pandas_df_nrow+2, 2, chart)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
library(tidyverse)
library(reticulate)
set.seed(19) # random seed fixed
# check if packages are available, otherwise install
for (package in c("pandas","xlsxwriter")) {
if (py_module_available(package)) {
message(package, " already installed! Proceeding...")
} else {
py_install(packages = package)
}
}
## generate some time series data for month & year
tbl <- expand_grid(Year=2017:2020, Month=month.name) %>% mutate(N=sample(1:100, size=nrow(.), replace=TRUE))
## ggplot2 plot of the data so we know what to expect
fig <-
ggplot(data=tbl) +
geom_line(aes(x=Month, y=N, group=Year, colour=factor(Year)), size=1) +
theme_minimal() +
NULL
print(fig) # see a ggplot2 version of same plot
# convert data to wide format to put in excel
tbl_wide_format <- tbl %>%
pivot_wider(names_from=Month, values_from=N)
# convert wide format data to pandas dataframe, to pass to python script
tbl_pandas <- r_to_py(tbl_wide_format)
## import python script
source_python("write_xlsx_and_chart_to_file.py")
## save chart using python script
save_time_series_as_xlsx_with_chart(tbl_pandas, "reticulate_pandas_writexlsx_excel_line_chart.xlsx")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With