Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chef client hanging on npm install at node-gyp rebuild

I'm having a problem with running npm install from a chef recipe. When I run it from the command line, it finishes fine in under a minute with just a few warnings related to package.json no repository field (which should be harmless). But when I run it from chef, it hangs with the last line output back to the command line as this:

* execute[npm-install-app] action run

Which is this resource block in the recipe:

execute "npm-install-app" do
  cwd "#{home}/#{prefix}#{app}"
  command "npm --registry #{priv['url']}:#{priv['port']}#{priv['path']} install --cache #{home}/.npm --tmp #{home}/tmp > npm-run.log 2>&1"
  user node['nodejs']['user']
  action :run
end

Where #{home} expands out to /home/nodejs and the user is nodejs.

As you can see, I redirected the output to a file to a file with > npm-run.log 2>&1. The output file gets the output of the npm install command written to it (unlike the command line), and the last thing that comes through is this:

-- a bunch of 200's and 304s, like this --
npm http 304 http://my.private.npm.amazonaws.com/registry/_design/app/_rewrite/esprima

[email protected] install /home/nodejs/my-app/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos
(node-gyp rebuild 2> builderror.log) || (exit 0)

kerberos is a dependency of one module we're relying on, but we aren't using kerberos ourselves. I gather from other sources that npm is running node-gyp to compile a version of the app that isn't available packaged on the npm server.

It will sit in that state for 2 hours until chef shellout registers a timeout and it shows a fatal error. ps -e will show npm is still running when chef-client is still running, and interrupting chef-client will cause npm to disappear from the process list, which suggests that npm still thinks it is still doing meaningful work, at least. (On a side note, while I was having connection problems, I was inclined to ask this question. There is a high degree of probability that this npm install is the underlying problem of the other question, but I think they warrant separate consideration.)

Edit: Running the chef-client with a -l debug adds a tiny amount of information to the /var/log/chef/client.log file, which basically confirms that the npm install command is the last resource to be executed before hanging:

[2014-01-09T22:49:28+00:00] INFO: Processing execute[npm-install-app] action run (my-app::default line 111)
[2014-01-09T22:49:28+00:00] DEBUG: Platform ubuntu version 12.04 found

Am I right in thinking that the || (exit 0) is throwing off the chef ShellOut provider detecting a successful exit? Is there anything I can do about it?

Edit 2: Chef just timed out from a run with -l debug set, and still only got log information on the timeout.

[2014-01-10T00:26:56+00:00] ERROR: execute[npm-install-app] (my-app::default line 111) had an error: Mixlib::ShellOut::CommandTimeout: command timed out:
---- Begin output of npm --registry http:my.private.npm.amazonaws.com:5984/registry/_design/app/_rewrite install --cache /home/nodejs/.npm --tmp /home/nodejs/tmp > npm-run.log 2>&1 ----
STDOUT:
STDERR:
---- End output of npm --registry http://ec2-54-221-190-191.compute-1.amazonaws.com:5984/registry/_design/app/_rewrite install --cache /home/nodejs/.npm --tmp /home/nodejs/tmp > npm-run.log 2>&1 ----

But! Another node just successfully finished after ~5 minutes and had this content in the npm-run.log file:

> [email protected] install /home/nodejs/spicoli-authorization/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos
> (node-gyp rebuild 2> builderror.log) || (exit 0)

make: Entering directory `/home/nodejs/spicoli-authorization/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos/build'
  SOLINK_MODULE(target) Release/obj.target/kerberos.node
  SOLINK_MODULE(target) Release/obj.target/kerberos.node: Finished
  COPY Release/kerberos.node
make: Leaving directory `/home/nodejs/spicoli-authorization/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos/build'

I can't think of why there would be such a huge performance difference, both servers are running on amazon small ec2 instances. Maybe there is a permissions difference between the home directory on the working and broken servers... I will investigate that angle.

like image 367
Patrick M Avatar asked Jan 09 '14 22:01

Patrick M


2 Answers

Well I finally took off my idiot hat and looked for the log in the right place. The command even says 2> builderror.log, so you'd think that would be enough of a tip to just find for a file of that name, but it still didn't occur to me. This is very frustrating, because the node-gyp command is apparently built into the kerberos source code and it silently hides errors from any calling process (like Chef, or any other build tool that might want to npm-install automatically).

Here is what it says (over and over again for about ~350 MB, thus the fun little hang! Good thing my Chef recipe was deleting the directory used on every run, or this could have gotten way harder to diagnose):

gyp WARN EACCES attempting to reinstall using temporary dev dir "/root/tmp/.node-gyp"
gyp WARN EACCES user "root" does not have permission to access the dev dir "/root/tmp/.node-gyp/0.10.22"

The curious thing is that node-gyp is working with files around this location: /home/nodejs/my-app/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos/, and my npm install command is running as the nodejs user, but it's still trying to write to /root as the root user! Something must be amiss with this, because root darn well has permissions for that directory.

ubuntu@amazonaws:~$ sudo ls -la /
-- snip --
drwx------  4 root root  4096 Jan  7 22:50 root

ubuntu@amazonaws:~$ sudo ls -la /root
total 24
drwx------  4 root root 4096 Jan  7 22:50 .
drwxr-xr-x 23 root root 4096 Jan  7 22:46 ..
-rw-r--r--  1 root root 3106 Apr 19  2012 .bashrc
drwx------  2 root root 4096 Jan  7 22:50 .cache
-rw-r--r--  1 root root  140 Apr 19  2012 .profile
drwx------  2 root root 4096 Jan  7 22:46 .ssh

At first, I thought I would just have to fix up permissions on the /home/nodejs directory, but this will take a follow up to the node-gyp developers, I think.

At least this explains why if I run the npm-install command bare as a different user (who has sudo permissions) it works.

Update: I eventually worked around this by letting the npm install run as root and then chown'ing and chmod'ing the installed files. The Chef resource blocks I used for this look something like this:

  # Recursively chown and chmod all files just created
  execute "fixup #{home}/#{prefix}#{app} owner" do
    command "find ./ -exec sudo chown #{node[:nodejs][:user]}:#{node[:nodejs][:user]} {} +"
    cwd "#{home}/#{prefix}#{app}"
  end

  execute "fixup #{home}/#{prefix}#{app} file permissions" do
    command "find ./ -type f -exec sudo chmod 644 {} +"
    cwd "#{home}/#{prefix}#{app}"
  end

  execute "fixup #{home}/#{prefix}#{app} directory permissions" do
    command "find ./ -type d -exec sudo chmod 755 {} +"
    cwd "#{home}/#{prefix}#{app}"
  end

This does not fix node-gyp's shortcomings in the permissions department, which I will continue to pursue and post another answer if I get a direct response on that front.

like image 103
Patrick M Avatar answered Sep 30 '22 08:09

Patrick M


This problem hangs about 10 minutes (felt like) on my OSX but it managed to finish. I used 'sudo npm install' to install mongoose from the terminal launched inside WebStorm IDE. (Haven't tried without sudo.)

-
> [email protected] install .../Documents/.../node_modules/mongoose/node_modules/mongodb/node_modules/kerberos
> (node-gyp rebuild 2> builderror.log) || (exit 0)

\
> [email protected] install .../Documents/.../node_modules/mongoose/node_modules/mongodb/node_modules/bson
> (node-gyp rebuild 2> builderror.log) || (exit 0)

<<<< HERE IS THE STRANGE HANGING >>>>


  CXX(target) Release/obj.target/bson/ext/bson.o
  SOLINK_MODULE(target) Release/bson.node
  SOLINK_MODULE(target) Release/bson.node: Finished
[email protected] node_modules/mongoose
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
└── [email protected] ([email protected], [email protected], [email protected])

$ ls -al

like image 26
Daniel C. Deng Avatar answered Sep 30 '22 06:09

Daniel C. Deng