Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

logouts while running hadoop under ubuntu 16.04

I am having some trouble with running hadoop jobs in both pseudo cluster and in cluster mode under ubuntu 16.04.

While running a vanila hadoop/hdfs installation - my hadoop user gets logged out and all of the processes that are run by this user are closed. I don't see anything indicating in logs (/var/log/systemd, journalctl or dmesg) that explains why the user gets logged out.

Seems like I am not the only who has problems with this or similar issue:

https://stackoverflow.com/questions/38288162/in-ubuntu-16-04-running-hadoop-jar-laptop-gets-rebooted

Note: creating special hadoop user hadn't actually solved the problem in my case - but limited the logouts to the dedicated user.

https://askubuntu.com/questions/784591/ubuntu-16-04-kills-session-when-resource-usage-is-extremely-high

Is it possible that some problem around the UserGroupInformation class (that can under some circumstances cause a logout), with maybe some changes in systemd in ubuntu 16.04 can cause this behavior?

The last lines of hadoop log that I get before the logout:

...
16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 4ms
16/07/13 16:45:37 DEBUG security.UserGroupInformation: PrivilegedAction
as:hduser (auth:SIMPLE)
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to
laptop/127.0.1.1:37339 from hduser sending #375
16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to
laptop/127.0.1.1:37339 from hduser got value #375
16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 2ms
Terminated
hduser@laptop:~$ 16/07/13 16:45:37 DEBUG ipc.Client: stopping client from
cache: org.apache.hadoop.ipc.Client@4e7ab839
exit

journalctl:

Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 7.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 6.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 5.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 8.

syslog:

Jul 12 16:06:43 laptop systemd[4172]: Stopped target Default.
Jul 12 16:06:43 laptop systemd[4172]: Reached target Shutdown.
Jul 12 16:06:44 laptop systemd[4172]: Starting Exit the Session...
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Basic System.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Sockets.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Paths.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Timers.
Jul 12 16:06:44 laptop systemd[4172]: Received SIGRTMIN+24 from PID
10101 (kill).
Jul 12 16:06:44 laptop systemd[1]: Stopped User Manager for UID 1001.
Jul 12 16:06:44 laptop systemd[1]: Removed slice User Slice of hduser.
like image 807
Michael Avatar asked Jul 17 '16 07:07

Michael


3 Answers

I also had the problem. It took me time, but I found the solution here: https://unix.stackexchange.com/questions/293069/all-services-of-a-user-are-killed-when-running-multiple-services-under-this-user

Basically, some hadoop processes just stop, because why not. But systemd seems to kill all user's process when he see a service's process dying.

The fix is to add

[login]
KillUserProcesses=no

to /etc/systemd/logind.confand reboot.

I had multiple ubuntu's version to debug the problem, and the fix seems to works only on ubuntu 16.04.

like image 70
Truelle Avatar answered Nov 17 '22 09:11

Truelle


I had the same issue. I was using Apache APEX which is hadoop native. While killing any APEX application my system used to log me out.

Solution : Replace the kill file (present in /bin/kill) of Ubuntu 16 with kill file of Ubuntu 14.

Everything works smoothly like before upgrade for me.

like image 3
Scorpio Avatar answered Nov 17 '22 09:11

Scorpio


I had the same problem too. Finally, I found /bin/kill in ubuntu16.04 has bug in killing process group can solve this problem.

If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid

Because of the bug in procps-ng-3.3.10, kill the process group whose ID starts with 1, invoked by bin/yarn application -kill AppID, will cause the user logouts.

The problem is solved after replacing /bin/kill with the new kill compiled from procps-ng-3.3.12.

tar xJf procps-ng-3.3.12.tar.xz
cd procps-ng-3.3.12
./configure
sudo cp .lib/kill /bin/kill
sudo chown root:root /bin/kill
sudo cp proc/.libs/libprocps.so.6.0.0 /lib/x86_64-linux/gnu/
sudo chown root:root /lib/x86_64-linux-gnu/libprocps.so.6.0.0
like image 3
runitao Avatar answered Nov 17 '22 07:11

runitao