Does Vowpal Wabbit automatically shuffle its data after every epoch/pass? I'm hoping the created cache file will contain the shuffling meta-data that is necessary for online algorithms like VW's default online SGD method. E.g.
vw -d train.txt -c --passes 50 -f train.model
If not, I have a backup script that manually shuffles the data on every pass
# Create the initial regressor file
vw -d train.txt -f train.model
# For the next 49 passes, shuffle and then update the regressor file
for i in {0..49}
do
<some script: train.txt --> shuffled_data.txt>
vw -d shuffled_data.txt -i train.model -f train.model
done
If VW doesn't automatically shuffle, then is there a more efficient way of performing the above code block? VW's wiki is unfortunately unclear with regards to this. Thanks.
No, it doesn't shuffle. I'd bet it's not worth shuffling the data either. Shuffling is very I/O intensive. While it might be better to do two passes with different shuffle order than two passes without shuffling, in terms of convergence, it's probably as costly as 10 passes without shuffling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With