Apologies if two days' frustration leaks through...
Problem: can't reliably run Tensorboard in jupyter notebook (actually, in Jupyter Lab) with
%tensorboard --logdir {logdir}
and if I kill the tensorboard process and start again in the notebook it says it is reusing the dead process and port, but the process is dead and netstat -ano | findstr :6006` shows nothing, so the port looks closed too.
Question: How in the name of $deity do I get tensorboard to restart from scratch and forget what it thinks it knows about processes, ports etc.? If I could do that I could hack away at residual path etc. issues...
Known issues already addressed (I think): need to escape backslashes in Python string to get proper path and other OS gremlins; avoid spaces in path, ensure correct capitalisation...
Environment: Win 64-bit Home with Anaconda and Tensforflow-GPU 2 installed via conda install - TF is working and writes data to the specified path given via the call back
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1) # logdir is the full path
But I'm damned if I can start Tensorboard reliably within the notebook.
I found that if I started an Anaconda command window and invoked tensorboard from there tensorboard started ok...
(TF2GPU_Anaconda) C:\Users\Julian>tensorboard --logdir "a:\tensorboard\20200102-112749"
2020-01-02 11:53:58.478848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.0.0 at http://localhost:6006/ (Press CTRL+C to quit)
It was accessibly in Chrome at localhost:6006 as stated (specifically http://localhost:6006/#scalars&run=20200102-112749%5Ctrain
) (i'll ignore the other problems with tensorboard such as refresh failures on scalars, odd message on graph, etc.) and
%tensorboard --logdir {logdir}
then shows tensorboard in the notebook and in the separate chrome tab.
However! whilst tensorboard reports in the notebook that it is reusing the old dead PID it is in fact on a completely different new PID
What have I been doing wrong, and how do I reset tensorboard completely?
PS the last (successful!) invocation was in fact with
%tensorboard --logdir {makeWindowsCmdPath('A:\\tensorboard\\20200102-112749')}
where makeWindowsCmdPath is defined as
def makeWindowsCmdPath(path):
return '\"' + str(path) + '\"'
UPDATE 2020-01-03 A MWE of eventual success has been uploaded in a comment at Github in response to an issue that includes the PID referencing errors of tensorboard
to close, I just do: close the tensorflow tab on my browser. on jupyter notebook, I click on interrupt kernel.
In the navigation pane, click Workspaces, then select Jupyter and launch a new workspace. From the Files tab in the workspace, click New > Tensorboard. You can access TensorBoard from the Running tab.
Hey—sorry to hear that you’re running into issues. It’s entirely plausible that everything that you describe is both accurate and my fault. :-)
How in the name of $deity do I get tensorboard to restart from scratch and forget what it thinks it knows about processes, ports etc.? If I could do that I could hack away at residual path etc. issues...
There is a directory called .tensorboard-info
in your temp directory
that maintains a best-effort registry of the TensorBoard jobs that we
think are running. When TensorBoard launches (in any manner, including
with %tensorboard
), it writes an “info file” to that directory, and
when you use %tensorboard
we first check to see if a “compatible
instance” (same working directory and CLI args) is still running, and if
so reuse it instead. When a TensorBoard instance shuts down cleanly, it
removes its own info file. The idea is that as long as TensorBoard is
shut down cleanly we should always have an accurate record of which
processes are live, and since this registry is in a temp directory any
errors due to hard shutdowns will be short-lived.
But this is where I erred: coming from the POSIX world and not being very familiar with Windows application development, I didn’t realize that the Windows temp directory is not actually automatically deleted, ever. Therefore, any bookkeeping errors persist indefinitely.
So, the answer to your question is, “remove the .tensorboard-info
directory located under tempfile.gettempdir()
” (preferably when you
don’t have any actively running TensorBoard instances).
There are ways that we can plausibly work around this in TensorBoard core: see https://github.com/tensorflow/tensorboard/issues/2483 for a start, and I’ve also considered amortized approaches like letting each TensorBoard instance perform some cleanup of other instances at start time. We haven’t yet gotten around to implementing these.
Let me know if this is helpful or if it fails to address your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With