I was running a very long training (reinforcement learning with 20M steps) and writing summary every 10k steps. In between step 4M and 6M, I saw 2 peaks in my TensorBoard scalar chart for game score, then I let it run and went to sleep. In the morning, it was running at about step 12M, but the peaks between step 4M and 6M that I saw earlier disappeared from the chart. I tried to zoom in and found out that TensorBoard (randomly?) skipped some of the data points. I also tried to export the data but some data point including the peaks are also missing in the exported .csv.
I looked for answers and found this in TensorFlow github page:
TensorBoard uses reservoir sampling to downsample your data so that it can be loaded into RAM. You can modify the number of elements it will keep per tag in tensorboard/backend/server.py.
Has anyone ever modified this server.py file? Where can I find the file and if I installed TensorFlow from source, do I have to recompile it after I modified the file?
--logdir is the directory you will create data to visualize. Files that TensorBoard saves data into are called event files. Type of data saved into the event files is called summary data. Optionally you can use --port=<port_you_like> to change the port TensorBoard runs on.
Head over to localhost:6006 to see TensorBoard on your local machine. We can see some of the scalar metrics that are provided by default With the linear Classifier. We can also Expand and Zoom into any of these graphs. Double-clicking allows us to zoom out.
TensorBoard is a visualization tool provided with TensorFlow. This callback logs events for TensorBoard, including: Metrics summary plots. Training graph visualization.
You don't have to change the source code for this, there is a flag called --samples_per_plugin
.
Quoting from the help command
--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly specify how many samples to keep per tag for that plugin. For unspecified plugins, TensorBoard randomly downsamples logged summaries to reasonable values to prevent out-of-memory errors for long running jobs. This flag allows fine control over that downsampling. Note that 0 means keep all samples of that type. For instance, "scalars=500,images=0" keeps 500 scalars and all images. Most users should not need to set this flag. (default: '')
So if you want to have a slider of 100 images, use:
tensorboard --samples_per_plugin images=100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With