I just started my manual cluster this morning in the production environment to run some code and it isn't executing and giving me the error "Failure starting repl. Try detaching and re-attaching the notebook.".
What can I do to solve this?
I have tried restarting my cluster. I have cloning my cluster. There is really a problem with my existing cluster but I don't know what.
Just to clarify, I don't have any jobs running with this cluster. Nothing is running, and not can be executed with this cluster.
Error message:
Failure starting repl. Try detaching and re-attaching the notebook.
java.lang.Exception: Futures timed out after [80 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
    at scala.concurrent.Await$.$anonfun$result$1(package.scala:223)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
    at scala.concurrent.Await$.result(package.scala:146)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal$RequestStatus.waitForReply(JupyterDriverLocal.scala:210)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal.requestKernelInfo(JupyterDriverLocal.scala:695)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal.$anonfun$startPython$1(JupyterDriverLocal.scala:1164)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at scala.util.Try$.apply(Try.scala:213)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal.com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry(JupyterDriverLocal.scala:1127)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal$$anonfun$com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry$1.applyOrElse(JupyterDriverLocal.scala:1130)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal$$anonfun$com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry$1.applyOrElse(JupyterDriverLocal.scala:1127)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at scala.util.Failure.recover(Try.scala:234)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal.com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry(JupyterDriverLocal.scala:1127)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal.startPython(JupyterDriverLocal.scala:1144)
    at com.databricks.backend.daemon.driver.JupyterDriverLocal.<init>(JupyterDriverLocal.scala:640)
    at com.databricks.backend.daemon.driver.PythonDriverWrapper.instantiateDriver(DriverWrapper.scala:792)
    at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:355)
    at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:241)
    at java.lang.Thread.run(Thread.java:750)
Just in case anyone needs to know how to solve this in the future.
Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.
So I fixed this by forcing specific versions in my global init script. I did the following:
/databricks/python/bin/pip install pandas==2.2.2
/databricks/python/bin/pip install numpy==1.26.4
/databricks/python/bin/pip install pyarrow==7.0.0
And this has solved the problem for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With