I have several different data frames that are related (and there is ids to join them if needed). However, I don't always need them at the same time.
Since they are quite large, does it make sense to store them in separate HDF stores? Or is the cost of carrying around the "unused" frames negligible when I'm working on the other frames in the same file?
Theoretically if you can separate your HDF files in terms of IO subsystem (different spindles, different storage systems, etc.), you can try to read your DFs in parallel, practically i would test it in your particular case on your hardware with your data, etc.
Another advantage of separating files - if you remove or dramatically decrease the size of a huge DF from/in the HDF Store containing multiple DFs - it's size will remain unchanged. If you have a separate file, you can simply drop it and free unused space
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With