I have previously asked here come up with following lines of code:
parameters = [{'weights': ['uniform'], 'n_neighbors': [5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}]
clf = GridSearchCV(neighbors.KNeighborsRegressor(), parameters, n_jobs=4)
clf.fit(features, rewards)
But when I've run this there has appeared another problem that was not related to the previously asked question. Python ends up with following OS error message:
Process: Python [1327]
Path: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 2.7.2.5 (2.7.2.5.r64662-trunk)
Code Type: X86-64 (Native)
Parent Process: Python [1316]
Responsible: Sublime Text 2 [308]
User ID: 501
Date/Time: 2014-08-12 10:27:24.640 +0200
OS Version: Mac OS X 10.9.4 (13E28)
Report Version: 11
Anonymous UUID: D10CD8B7-221F-B121-98D4-4574A1F2189F
Sleep/Wake UUID: 0B9C4AE0-26E6-4DE8-B751-665791968115
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110
VM Regions Near 0x110:
-->
__TEXT 0000000100000000-0000000100001000 [ 4K] r-x/rwx SM=COW /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Application Specific Information:
*** multi-threaded process forked ***
crashed on child side of fork pre-exec
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libdispatch.dylib 0x00007fff91534c90 dispatch_group_async_f + 141
1 libBLAS.dylib 0x00007fff9413f791 APL_sgemm + 1061
2 libBLAS.dylib 0x00007fff9413cb3f cblas_sgemm + 1267
3 _dotblas.so 0x0000000102b0236e dotblas_matrixproduct + 5934
4 org.activestate.ActivePython27 0x00000001000c552d PyEval_EvalFrameEx + 23949
5 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
6 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
7 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
8 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
9 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
10 org.activestate.ActivePython27 0x00000001000c098a PyEval_EvalFrameEx + 4586
11 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
12 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
13 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
14 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
15 org.activestate.ActivePython27 0x00000001000c7137 PyEval_EvalFrameEx + 31127
16 org.activestate.ActivePython27 0x00000001000c7137 PyEval_EvalFrameEx + 31127
17 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
18 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
19 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
20 org.activestate.ActivePython27 0x00000001000c098a PyEval_EvalFrameEx + 4586
21 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
22 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
23 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
24 org.activestate.ActivePython27 0x000000010001d36d instancemethod_call + 365
25 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
26 org.activestate.ActivePython27 0x0000000100077dfa slot_tp_call + 74
27 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
28 org.activestate.ActivePython27 0x00000001000c098a PyEval_EvalFrameEx + 4586
29 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
30 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
31 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
32 org.activestate.ActivePython27 0x00000001000c098a PyEval_EvalFrameEx + 4586
33 org.activestate.ActivePython27 0x00000001000c7137 PyEval_EvalFrameEx + 31127
34 org.activestate.ActivePython27 0x00000001000c7137 PyEval_EvalFrameEx + 31127
35 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
36 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
37 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
38 org.activestate.ActivePython27 0x000000010001d36d instancemethod_call + 365
39 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
40 org.activestate.ActivePython27 0x0000000100077a28 slot_tp_init + 88
41 org.activestate.ActivePython27 0x0000000100074e25 type_call + 245
42 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
43 org.activestate.ActivePython27 0x00000001000c267d PyEval_EvalFrameEx + 11997
44 org.activestate.ActivePython27 0x00000001000c7137 PyEval_EvalFrameEx + 31127
45 org.activestate.ActivePython27 0x00000001000c7137 PyEval_EvalFrameEx + 31127
46 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
47 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
48 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
49 org.activestate.ActivePython27 0x000000010001d36d instancemethod_call + 365
50 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
51 org.activestate.ActivePython27 0x0000000100077a28 slot_tp_init + 88
52 org.activestate.ActivePython27 0x0000000100074e25 type_call + 245
53 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
54 org.activestate.ActivePython27 0x00000001000c267d PyEval_EvalFrameEx + 11997
55 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
56 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
57 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
58 org.activestate.ActivePython27 0x000000010003d390 function_call + 176
59 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
60 org.activestate.ActivePython27 0x000000010001d36d instancemethod_call + 365
61 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
62 org.activestate.ActivePython27 0x0000000100077dfa slot_tp_call + 74
63 org.activestate.ActivePython27 0x000000010000be12 PyObject_Call + 98
64 org.activestate.ActivePython27 0x00000001000c267d PyEval_EvalFrameEx + 11997
65 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
66 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
67 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
68 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
69 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
70 org.activestate.ActivePython27 0x00000001000c5d10 PyEval_EvalFrameEx + 25968
71 org.activestate.ActivePython27 0x00000001000c7ad6 PyEval_EvalCodeEx + 2118
72 org.activestate.ActivePython27 0x00000001000c7bf6 PyEval_EvalCode + 54
73 org.activestate.ActivePython27 0x00000001000ed31e PyRun_FileExFlags + 174
74 org.activestate.ActivePython27 0x00000001000ed5d9 PyRun_SimpleFileExFlags + 489
75 org.activestate.ActivePython27 0x00000001001041dc Py_Main + 2940
76 org.activestate.ActivePython27.app 0x0000000100000ed4 0x100000000 + 3796
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000100 rbx: 0x00007fff7cd43640 rcx: 0x0000000000000000 rdx: 0x0000000105e00000
rdi: 0x0000000000000008 rsi: 0x0000000105e01000 rbp: 0x00007fff5fbfa370 rsp: 0x00007fff5fbfa350
r8: 0x0000000000000001 r9: 0x0000000105e00000 r10: 0x0000000105e01000 r11: 0x0000000000000000
r12: 0x000000010ba10530 r13: 0x000000010b000000 r14: 0x00000001066d1970 r15: 0x00007fff915311af
rip: 0x00007fff91534c90 rfl: 0x0000000000010206 cr2: 0x0000000000000110
Logical CPU: 2
Error Code: 0x00000006
Trap Number: 14
.........
VM Region Summary:
ReadOnly portion of Libraries: Total=183.7M resident=97.0M(53%) swapped_out_or_unallocated=86.7M(47%)
Writable regions: Total=1.3G written=142.8M(11%) resident=503.6M(39%) swapped_out=0K(0%) unallocated=791.7M(61%)
When I have replaced the second line in my code by:
clf = GridSearchCV(neighbors.KNeighborsRegressor(), parameters, n_jobs=1)
Then everything works fine except I don't use multiple threads.
My operating system is OSX 10.9.4
My python version is 2.7.8 |Anaconda 2.0.1 (x86_64)| (default, Jul 2 2014, 15:36:00) [GCC 4.2.1 (Apple Inc. build 5577)]
My scikit-lern version is 0.14.1
My numpy version is 1.8.1
And my scipy version is 0.14.0
My question is if anybody has an idea how to make GridSearchCV run on more than one thread?
EDIT:
I have realized that actually this error happens only for some of my input data sets. Unfortunately the problematic datasets (its X) are too big so it is not possible to copy them in here. Input features data is basically tf-idf vectors and y vectors are floats > 0, particularly:
[60.0, 7.0, 12.0, 21.0, 5.5, 3.0, 0.0, 2.5, 11.0, 3.0, 16.0, 2.0, 0.0, 4.5, 2.5, 6.0, 9.5, 2.5, 15.0, 7.0, 8.0, 13.0, 14.0, 8.0, 3.5, 6.0, 22.5, 7.0, 4.0, 3.5, 4.5, 6.0, 5.5, 7.0, 2.0, 0.0, 0.0, 0.0, 14.5, 8.0, 7.5, 2.5, 11.5, 1.0, 3.0, 14.5, 10.0, 14.5, 8.0, 8.0, 7.0, 2.5, 3.5, 3.0, 13.5, 7.0, 6.5, 2.5, 9.0, 8.0, 11.0, 17.5, 12.5, 4.5, 5.5, 8.0, 2.0, 7.0, 4.0, 1.5, 3.0, 21.5, 4.5, 4.0, 7.0, 9.0, 13.5, 8.0, 10.5, 4.5, 1.5, 11.5, 7.5, 11.5, 4.5, 5.0, 7.0, 9.5, 4.0, 4.0, 6.0, 3.5, 4.5, 7.5, 3.5, 3.5, 3.5, 6.0, 5.0, 5.5, 25.0, 6.5, 5.0, 2.0, 2.0, 10.5, 0.0, 6.5, 19.0, 9.0, 1.0, 1.5, 1.0, 0.0, 1.0, 4.5, 2.5, 17.5, 39.5, 7.5, 5.5, 8.0, 1.0, 6.0, 12.0, 10.0, 5.5, 19.0, 4.5, 1.5, 25.5, 4.0, 10.0, 18.5, 9.5, 10.5, 2.5, 6.0, 1.0, 10.0, 8.5, 12.5, 13.5, 5.0, 6.5, 11.0, 4.5, 8.0, 7.5, 11.5, 14.5, 9.0, 3.0, 1.5, 3.5, 5.5, 2.5, 12.5, 6.5, 5.5, 5.0, 0.0, 8.0, 3.0, 14.5, 5.0, 14.0, 7.0, 13.5, 12.5, 4.0, 1.5, 6.5, 10.5, 9.0, 16.5, 4.0, 4.0, 15.0, 11.5, 2.5, 8.5, 3.0, 5.0, 4.0, 8.5, 6.0, 5.0, 5.0, 5.0, 5.5, 8.0, 11.0, 4.0, 0.0, 5.5, 0.0, 4.5, 1.5, 0.0, 6.5, 11.0, 2.5, 8.0, 15.5, 5.5, 4.5, 5.0, 4.0, 5.5, 10.5, 7.5, 6.5, 8.5, 2.5, 1.5, 1.5, 18.0, 15.0, 14.0, 9.5, 5.5, 7.5, 14.5, 2.5, 5.0, 60.0, 6.5, 14.5, 6.5, 4.0, 1.5, 2.0, 4.0, 27.0, 3.0, 5.0, 4.0, 2.5, 1.0, 1.5, 1.5, 9.0, 4.0, 8.5, 4.0, 4.0, 0.0, 1.5, 7.5, 1.5, 7.5, 1.0, 28.5, 15.5, 7.5, 1.0, 2.5, 2.5, 2.5, 16.0, 5.5, 8.5, 4.0, 2.5, 5.0, 2.5, 6.0, 11.0, 10.0, 4.5, 6.5, 8.0, 6.0, 4.5, 15.5, 4.0, 5.0]
The version with 1 job works for all of my input data sets, even for this one.
libdispatch.dylib
from Grand Central Dispatch is used internally by OSX's builtin implementation of BLAS called Accelerate when you do a numpy.dot
calls. The GCD runtime does not work when programs call the POSIX fork
syscall without using an exec
syscall afterwards and therefore makes all Python programs that use the multiprocessing
module prone to crash. sklearn's GridsearchCV
uses the Python multiprocessing
module for parallelization.
Under Python 3.4 and later you can force Python multiprocessing to use the forkserver start method instead of the default fork
mode to workaround this problem, for instance at the beginning of the main file of your program:
if __name__ == "__main__":
import multiprocessing as mp; mp.set_start_method('forkserver')
Alternatively, you can rebuild numpy from source and make it link against ATLAS or OpenBLAS instead of OSX Accelerate. The numpy developers are working on binary distributions that include either ATLAS or OpenBLAS by default.
That worked perfectly for me as well (upgrading was a bit of a drag but this was the only fix, of many attempted, that worked in my case). For any other ipython notebook users out there, the best way to work this in is to add it to the notebook configuration (you'll get an error trying to run it straight in a notebook). The commands can be added like this:
# in ipython_notebook_config.py
c.IPKernelApp.exec_lines = ['import multiprocessing', 'multiprocessing.set_start_method("forkserver")']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With