Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix too many open files error when aggregating billions of records

I got the following error

opening file "/workspace/mongo/data/_tmp/extsort.63355": errno:24 Too many open files

How could I fix this error ?

Because the opened files is alreaday 63355 ?

2015-05-02T08:01:40.490+0000 I COMMAND  [conn1] command sandbox.$cmd command: listCollections { listCollections: 1.0 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:411 locks:{} 169ms
2015-05-02T15:01:02.060+0000 I -        [conn2] Assertion: 16818:error opening file "/workspace/mongo/data/_tmp/extsort.63355": errno:24 Too many open files
2015-05-02T15:01:02.235+0000 I CONTROL  [conn2] 
 0xf4d299 0xeeda71 0xed2d3f 0xed2dec 0xb3f453 0xb3c88c 0xb3d2dd 0xb3dfe2 0xb499c5 0xb49136 0xb7e3e6 0x987165 0x9d8b04 0x9d9aed 0x9da7fb 0xb9e956 0xab4d20 0x80e75d 0xf00e6b 0x7fe38e8b4182 0x7fe38d37c47d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B4D299"},{"b":"400000","o":"AEDA71"},{"b":"400000","o":"AD2D3F"},{"b":"400000","o":"AD2DEC"},{"b":"400000","o":"73F453"},{"b":"400000","o":"73C88C"},{"b":"400000","o":"73D2DD"},{"b":"400000","o":"73DFE2"},{"b":"400000","o":"7499C5"},{"b":"400000","o":"749136"},{"b":"400000","o":"77E3E6"},{"b":"400000","o":"587165"},{"b":"400000","o":"5D8B04"},{"b":"400000","o":"5D9AED"},{"b":"400000","o":"5DA7FB"},{"b":"400000","o":"79E956"},{"b":"400000","o":"6B4D20"},{"b":"400000","o":"40E75D"},{"b":"400000","o":"B00E6B"},{"b":"7FE38E8AC000","o":"8182"},{"b":"7FE38D282000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.1", "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952", "uname" : { "sysname" : "Linux", "release" : "3.13.0-44-generic", "version" : "#73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "C35E766AD226FC0C16CB0C3885EC3B59E288A3F2" }, { "b" : "7FFF448FE000", "elfType" : 3, "buildId" : "9D77366C6409A9EA266179080FA7C779EEA8A958" }, { "b" : "7FE38E8AC000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FE38E64E000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7FE38E273000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B927879B878D90DD9FF4B15B00E7799AA8E0272F" }, { "b" : "7FE38E06B000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FE38DE67000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FE38DB63000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7FE38D85D000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FE38D647000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7FE38D282000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FE38EACA000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf4d299]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xeeda71]
 mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xed2d3f]
 mongod(+0xAD2DEC) [0xed2dec]
 mongod(_ZN5mongo16SortedFileWriterINS_5ValueES1_EC1ERKNS_11SortOptionsERKSt4pairINS1_25SorterDeserializeSettingsES7_E+0x5D3) [0xb3f453]
 mongod(_ZN5mongo19DocumentSourceGroup5spillEv+0x1BC) [0xb3c88c]
 mongod(_ZN5mongo19DocumentSourceGroup8populateEv+0x46D) [0xb3d2dd]
 mongod(_ZN5mongo19DocumentSourceGroup7getNextEv+0x292) [0xb3dfe2]
 mongod(_ZN5mongo21DocumentSourceProject7getNextEv+0x45) [0xb499c5]
 mongod(_ZN5mongo17DocumentSourceOut7getNextEv+0xD6) [0xb49136]
 mongod(_ZN5mongo8Pipeline3runERNS_14BSONObjBuilderE+0xA6) [0xb7e3e6]
 mongod(_ZN5mongo15PipelineCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x7A5) [0x987165]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9d8b04]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC7D) [0x9d9aed]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9da7fb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x746) [0xb9e956]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xab4d20]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x80e75d]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf00e6b]
 libpthread.so.0(+0x8182) [0x7fe38e8b4182]
 libc.so.6(clone+0x6D) [0x7fe38d37c47d]
-----  END BACKTRACE  -----
2015-05-02T15:02:07.753+0000 I COMMAND  [conn2] CMD: drop sandbox.tmp.agg_out.1

UPDATE

I typed ulimit -n unlimited on the console,

and modified the /etc/security/limits.conf with the following setting

* soft nofile unlimited
* hard nofile unlimited
* soft nproc unlimited
* hard nproc unlimited

check it by ulimit -a

health# ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       unlimited
-n: file descriptors                4096
-l: locked-in-memory size (kbytes)  64
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 31538
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited
health# ulimit -Sn
4096
health# ulimit -Hn
4096

Is my system's setting alreday unlimited on open files ?

like image 690
user3675188 Avatar asked May 04 '15 06:05

user3675188


2 Answers

There is no clean answer for this as you are doing something very heavy stuff but workaround is available

ulimit is command in unix/linux which allows to set system limits for all properties.

in your case you need to increase max. no. of open files count or make it unlimited on safer side (it is also recommended by MongoDB)

ulimit -n <large value in your case 1000000>

or 

sysctl -w fs.file-max=1000000

and

/etc/security/limits.conf or /etc/sysctl.conf:
change 

fs.file-max = 1000000
like image 60
Nachiket Kate Avatar answered Sep 28 '22 05:09

Nachiket Kate


I found that it was necessary to change the system-wide settings (using ulimit as suggested Nachiket Kate; another great description for Ubuntu may be found here) as well as the mongodb settings (as documented here).

For the sake of explanation, I'll summarize the commands I performed to get a handle on things (I'll reference the links again where they belong in the discussion).

Determine if the maximum number of file descriptors as enforced by the kernel are sufficient (the amount was sufficient)?

$ cat /proc/sys/fs/file-max
6569231

In my case this was not the problem. Checking the ulimit settings for the mongodb user revealed what the number of file descriptors were a paltry 1024:

$ sudo -H -u mongodb bash -c 'ulimit -a'
...
open files                      (-n) 1024
...

These values could be changed for all users by increasing the soft (user can modify them) and hard limits (I set mine quite high):

$ sudo su
$ echo -e "* hard\tnofile\t1000000\n* soft\tnofile\t990000" >> /etc/security/limits.conf

This may also be done on a user basis by replacing the * with the username. Although this worked on per-user basis, restarting the mongo daemon resulted in the number of file descriptors returning to 1024. It was necessary to follow the advice here regarding the pam session:

$ for file in /etc/pam.d/common-session*; do 
      echo 'session required pam_limits.so' >> $file
  done

To test that the settings have been applied, I created a wee python script (placed in /tmp/file_descriptor_test.py):

#!/usr/bin/env python
n=990000

fd_list=list()
for i in range(1,n):
    fd_list.append(open('/tmp/__%08d' % (i), 'w'))

print 'opened %d fds' % n

Running this as the mongodb user revealed that all was well system-wise:

sudo -H -u mongodb bash -c '/tmp/file_descriptor_test.py'
Traceback (most recent call last):
File "/tmp/fd.py", line 8, in <module>
IOError: [Errno 24] Too many open files: '/tmp/__00989998'

The files in /tmp/ may be deleted using

sudo find -type f -name '__*' -delete 

as you'll be unable to list them properly (so rm doesn't work).

However, when running the offending mongo process, I still encountered the same Too many open files error. This led me to believe that the problem also lay with mongo (and led me, finally and embarrassingly to the excellent documentation. Editing the etc/systemd/system/multi-user.target.wants/mongodb-01.service and adding the following lines beneath [Service] directive

# (file size)
LimitFSIZE=infinity
# (cpu time)
LimitCPU=infinity
# (virtual memory size)
LimitAS=infinity
# (open files)
LimitNOFILE=990000
# (processes/threads)
LimitNPROC=495000

finally resolved the issue (remember to restart systemctl with sudo systemctl daemon-reload && systemctl restart mongodb-01.service). You can monitor the progress of the mongo process (mine was a temporary space hungry aggregate) via

$ while true; do echo $(find /var/lib/mongodb_01/_tmp/ | wc -l); sleep 1; done
like image 42
0_0 Avatar answered Sep 28 '22 06:09

0_0