Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most useful output format for graphs? [closed]

Tags:

r

Before any of you run at the closing vote let me say that I understand that this question may be subjective, and the expected answer may begin by "it depends". Nevertheless, it is an actually relevant problem I run into, as I am creating more and more graphs, and I don't necessarily know the exact way I am going to use them, or don't have the time to test for the final use case immediately.

So I am leveraging the experience of SO R users to get good reasons to choose one over the other, between jpg(), bmp(), png(), tiff(), pdf() and possibly with which options. I don't have the experience in R and the knowledge in the different formats to choose wisely.

Potential use cases:

  • quick look after or during run time of algorithms
  • presentations (.ppt mainly)
  • reports (word or latex)
  • publication (internet)
  • storage (without too much loss and to transform it later for a specific use)
  • anything relevant I forgot

Thanks! I'm happy to make the question clearer.

like image 754
Antoine Lizée Avatar asked Sep 06 '13 01:09

Antoine Lizée


People also ask

Which type of graph is better for showing distribution of data?

Scatter plots are best for showing distribution in large data sets.

Which type of graph should I use?

If you want to compare values, use a pie chart — for relative comparison — or bar charts — for precise comparison. If you want to compare volumes, use an area chart or a bubble chart. If you want to show trends and patterns in your data, use a line chart, bar chart, or scatter plot.

What type of graph is used for data?

Bar charts are good for comparisons, while line charts work better for trends. Scatter plot charts are good for relationships and distributions, but pie charts should be used only for simple compositions — never for comparisons or distributions.


2 Answers

To expand a little on my comment, there is no real easy answer, but my suggestions:

  1. My first totally flexible choice would be to simply store the final raw data used in the plot(s) and a bit of R code for generating the plot(s). That way you could easily enough send the output to whatever device that suits your particular purpose. It would not be that arduous a task to set yourself up a couple of basic templates based on png()/pdf() that you could call upon.

  2. Use the svg() device. As noted by @gung, storing the output using pdf() , svg() , cairo_ps() or cairo_pdf() are your only real options for retaining scalable, vector images. I would tend to lean towards svg() rather than pdf() due to the greater editing options available using programs like Inkscape. It is also becoming a quite widely supported format for internet publication (see - http://caniuse.com/svg )

  3. If on the other hand you're a latex user, most headaches seem to be solved by going straight to pdf() - you can usually import and convert pdf files using Inkscape or command line utilities like Imagemagick if you have to format shift.

  4. For Word/Powerpoint interaction, if you are running R on Windows, you can also export directly using win.metafile() which will give you scalable/component emf images which you can import into Word or Powerpoint directly. I have heard of people running R through Wine or using intermediary steps on Linux to get emf files out for later use. For Mac, there are roundabout pathways as well.

So, to summarise, in order of preference.

  1. Don't store images at all, store code to generate images
  2. Use svg/pdf and convert formats as required.
  3. Use a backup win.metafile export directly for those cases where you can't escape using Word/Powerpoint and are primarily going to be based on Windows systems.
like image 154
thelatemail Avatar answered Sep 20 '22 19:09

thelatemail


So far the answers for this question have all recommended outputting plots in vector based formats. This will give you the best output, allowing you to resize your image as you need for whatever medium your image will end up in (whether that be a webpage, document, or presentation), but this comes at a computational cost.

For my own work, I often find it is much more convenient to save my plots in a raster format of sufficient resolution. You probably want to do this whenever your data takes a non-trivial amount of time to plot.

Some examples of where I find a raster format is more convenient:

  1. Manhattan plots: A plot showing p-value significance for hundreds of thousands-millions of DNA markers across a genome.
  2. Large Heatmaps: Clustering the top 5000 differentially expressed genes between two groups of people, one with a disease, and one healthy.
  3. Network Rendering: When drawing a large number of nodes connected to each other by edges, redrawing the edges (as vectors) can slow down your computer.

Ultimately it comes down to a trade-off in your own sanity. What annoys you more? your computer grinding to a halt trying to redraw an image? or figuring out the exact dimensions to render an image in raster format so it doesn't look awful for your final publishing medium?

like image 34
Scott Ritchie Avatar answered Sep 24 '22 19:09

Scott Ritchie