Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use R to create chart in Excel sheet

Tags:

r

excel

I am using openXLSX package to generate excel files from my R output.
There is no way I have found to add excel charts to the excel workbook.
I saw that python has a module for creating Excel files that has a class for adding excel charts.
Is there a way to do this with R?

like image 668
eliavs Avatar asked Aug 11 '16 06:08

eliavs


People also ask

Can I use R code in Excel?

Just because you have an xlsx file doesn't mean you can't analyze it with R. Likewise, you can write R code to populate and manipulate Excel spreadsheets. Today you'll get a grasp of two tools used to manage communication between R and Excel. These are readxl and Bert.

Can you make charts in R?

R can be used to explore, clean, analyze and visualize data. The following tutorial will get you started using R's ggplot2 package to make a simple line chart from a csv of data.


Video Answer


2 Answers

Here's a solution using package XLConnect. A small note though, it relies on templates of charts, which you need to create in advance, and it generates new files instead of appending sheets or charts to existing files.

It is comprised of two stages:

  1. Preparing Excel templates for the type of charts you would like to use.
  2. Updating the template files with the data from R as required each time.

First step: prepare templates in excel, according to the types of charts you will need. You can have all the templates on the same file (in different sheets) or in several different files. When you prepare the templates, include the type of charts you need within the sheet, but instead of referring to specific cells, you need to use "named ranges". See for example. You can also use the sample file I created. Note the use of named ranges in the file and in the chart's data references (as Sheet1!bar_names and Sheet1!values instead of Sheet1!$A$2:$A$4 and Sheet1!$B$2:$B$4).

A side note on named ranges in Excel. Named ranges mean you give names to the data you're going to use in the chart, and then you "tell the chart" to use the named range, instead of an absolute location. You can access the "Name Manager" in excel in the "Formulas" menu. The reason we are using named ranges is that XLConnect is able to control named ranges, hence the chart will dynamically update when we modify the named range.

Second step: use an adaptation of the following code such that it suits your needs. Mostly use your own data frame and update the reference in the createName function.

library(XLConnect) # load library
wb1 <- loadWorkbook(filename = "edit_chart_via_R_to_excel.xlsx") 
new.df <- data.frame(Type = c("Ford", "Hyundai", "BMW", "Other"),
          Number = c(45, 35, 25, 15)) # sample data
writeWorksheet(wb1, data = new.df, sheet = "Sheet1", 
               startRow = 1, startCol = 1, header = TRUE)
# update named ranges for the chart's use.
# Note that 
# "Sheet1!$A$2:$A$5" and "Sheet1!$B$2:$B$5" 
# should change according to the data you are updating
createName(wb1, "bar_names", "Sheet1!$A$2:$A$5", overwrite = TRUE) 
createName(wb1, "values", "Sheet1!$B$2:$B$5", overwrite = TRUE)
saveWorkbook(wb1)

This should do the trick.

Note that you can copy and save the template before you start the modifications, if you want to provide it as a new file (and to preserve the original template without overwriting it).

like image 126
Adi Sarid Avatar answered Oct 20 '22 08:10

Adi Sarid


I looked into using reticulate to write the .xlsx file from scratch, with a native excel chart based on the data, avoiding having to make a template. The script below generates some data, saves it to the .xlsx file, then builds a line chart below the data. See the documentation at https://xlsxwriter.readthedocs.io/chart.html for different chart types!

Note also that this shoudl prompt you to install Python if reticulate can't find an existing installation.

The code is available at the following gist: https://gist.github.com/jsavn/cbea4b35d73cea6841489e72a221c4e9

Python script write_xlsx_and_chart_to_file.py

(The filename of this is used in the source() call in the R script later)

import pandas as pd
import xlsxwriter as xw

# The skeleton of below function based on example from: https://xlsxwriter.readthedocs.io/example_pandas_chart.html#ex-pandas-chart
# We pass the function a pandas dataframe;
# The dataframe is inserted in an .xslx spreadsheet
# We take note of the number of rows and columns, and use those to position the chart below the data
# We then iterate over the rows of the data and insert each row as a separate line (series) in the line chart

def save_time_series_as_xlsx_with_chart(pandas_df, filename):
  if not(filename.endswith('.xlsx')):
    print("Warning: added .xlsx to filename")
    filename = filename + '.xlsx'
  # Create a Pandas dataframe from the data.
  # pandas_df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

  ## get dimensions of data frame to use for positioning the chart later
  pandas_df_nrow, pandas_df_ncol = pandas_df.shape

  # Create a Pandas Excel writer using XlsxWriter as the engine.
  writer = pd.ExcelWriter(filename, engine='xlsxwriter')

  # Convert the dataframe to an XlsxWriter Excel object.
  pandas_df.to_excel(writer, sheet_name='Sheet1', index=False)

  # Get the xlsxwriter workbook and worksheet objects.
  workbook  = writer.book
  worksheet = writer.sheets['Sheet1']

  # Create a chart object.
  chart = workbook.add_chart({'type': 'line'})

  # Configure the series of the chart from the dataframe data
  # THe coordinates of each series in the line chart are the positions of the data in the excel file
  # Note that data starts at row 2, column 1, so the row/col values need to be adjusted accordingly
  # However, python counts rows & columns from 0
  for row_in_data in range(0,pandas_df_nrow):
    row_in_sheet = row_in_data+1  # data starts on 2nd row
    last_col_in_sheet = pandas_df_ncol-1 # number of columns minus one in 0-notation
    first_col_with_data = 1  # 2nd column in 0-notation
    range_of_series = xw.utility.xl_range(
      first_row=row_in_sheet,  # read from the current row in loop only
      first_col=first_col_with_data, # data starts in 2nd column, i.e. 1 in 0-notation
      last_row=row_in_sheet,
      last_col=last_col_in_sheet
      )
    range_of_categories = xw.utility.xl_range(
      first_row=0, # read from 1st row only - header
      first_col=first_col_with_data,  # read from 2nd column for month headers
      last_row=0, 
      last_col=last_col_in_sheet
      )
    formula_for_series = '=Sheet1!' + range_of_series
    col_with_series_name = 0  # first column
    name_of_series = '=Sheet1!' + xw.utility.xl_rowcol_to_cell(row=row_in_sheet, col=col_with_series_name)
    formula_for_categories = 'Sheet1!' + range_of_categories
    chart.add_series({'values': formula_for_series, 'name': name_of_series, 'categories': formula_for_categories})

  # Insert the chart into the worksheet.
  worksheet.insert_chart(pandas_df_nrow+2, 2, chart)

  # Close the Pandas Excel writer and output the Excel file.
  writer.save()

R script

library(tidyverse)
library(reticulate)

set.seed(19)  # random seed fixed

# check if packages are available, otherwise install
for (package in c("pandas","xlsxwriter")) {
  if (py_module_available(package)) {
    message(package, " already installed! Proceeding...")
  } else {
    py_install(packages = package)  
  }
}

## generate some time series data for month & year
tbl <- expand_grid(Year=2017:2020, Month=month.name) %>% mutate(N=sample(1:100, size=nrow(.), replace=TRUE))

## ggplot2 plot of the data so we know what to expect
fig <- 
  ggplot(data=tbl) +
  geom_line(aes(x=Month, y=N, group=Year, colour=factor(Year)), size=1) +
  theme_minimal() +
  NULL
print(fig)  # see a ggplot2 version of same plot

# convert data to wide format to put in excel
tbl_wide_format <- tbl %>%
  pivot_wider(names_from=Month, values_from=N)

# convert wide format data to pandas dataframe, to pass to python script
tbl_pandas <- r_to_py(tbl_wide_format)

## import python script
source_python("write_xlsx_and_chart_to_file.py")

## save chart using python script
save_time_series_as_xlsx_with_chart(tbl_pandas, "reticulate_pandas_writexlsx_excel_line_chart.xlsx")
like image 23
jsavn Avatar answered Oct 20 '22 07:10

jsavn