Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R and data.table on AWS

This is a very strange error I am running into trying to install a specific R library on AWS EC2 instance (Amazon Linux AMI). A link in the AWS forums here, posted by someone else, actually highlights my issue well.

The main / relevant part of the error message is:

data.table.h:6:12: fatal error: omp.h: No such file or directory
#include <omp.h> 

I did some research (while I'm not sure), I think this is related to whether or not there is an OpenMP enabled compiler on the server. The data.table installation page on GitHub itself discusses this a bit here, but I'm not sure how to update or fix this on my EC2 instance.

Any help with this is appreciated.

EDIT - this is a new problem, as i was able to successfully install data.table on a previous, similar EC2 instance less than a month ago.

EDIT 2 - I got around this issue by taking a previous EC2 instance of mine with data.table already installed on it, and creating a custom AMI from it. By using this custom AMI when launching new instances, they already came with the data.table library installed. If I notice AWS resolve this issue on its own, I'll try to remember to come back and update this post!

like image 542
Canovice Avatar asked Feb 02 '18 06:02

Canovice


People also ask

Can you use R in AWS?

Using these packages, you can easily take advantage of AWS resources and databases running on AWS from within R. In part 2, “Using R with Amazon Web Services for document analysis“, we'll show you how to use these to build a data workflow to convert PDFs into data we can use, by taking advantage of services on AWS.

How do I create a database table in AWS?

To create a table using the AWS Glue crawler. Open the Athena console at https://console.aws.amazon.com/athena/ . In the query editor, next to Tables and views, choose Create, and then choose AWS Glue crawler. Follow the steps on the Add crawler page of the AWS Glue console to add a crawler.

Does DynamoDB have tables?

In DynamoDB, tables, items, and attributes are the core components that you work with. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility.

Can R connect to redshift?

Connecting R to Amazon Redshift with RJDBCAs soon as you have an R session and the data loaded to Amazon Redshift, you can connect them. The recommended connection method is using a client application or tool that executes SQL statements through the PostgreSQL ODBC or JDBC drivers.


2 Answers

For some reason setting CC=gcc64 in the ~/.R/Makevars did not work for me. R was still using default gcc to compile.

However there is another option. You can edit the Makeconf file that R uses during compilation directly. If you're using Amazon Linux the file location is /usr/lib64/R/etc/Makeconf. Once you locate the file the trick is just the same, that is to change the CC = gcc to CC = gcc64. You also might want to make sure that gcc64 is installed by running sudo yum install gcc64.

like image 157
Mikolaj Avatar answered Sep 18 '22 04:09

Mikolaj


The problem here is that data.table doesn't play nice with the default gcc compiler (gcc72-c++.x86_64 gets installed as a dependency of R-devel.x86_64). Point R to an older version by adding

CC=gcc64

in ~/.R/Makevars. If you start from a "clean" Amazon Linux AMI this file doesn't exist and you can just type

mkdir ~/.R
echo "CC=gcc64" >> ~/.R/Makevars
like image 24
Andreas Dzemski Avatar answered Sep 17 '22 04:09

Andreas Dzemski