Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Vowpal Wabbit shuffle data in multiple online passes?

Does Vowpal Wabbit automatically shuffle its data after every epoch/pass? I'm hoping the created cache file will contain the shuffling meta-data that is necessary for online algorithms like VW's default online SGD method. E.g.

vw -d train.txt -c --passes 50 -f train.model

If not, I have a backup script that manually shuffles the data on every pass

# Create the initial regressor file
vw -d train.txt -f train.model
# For the next 49 passes, shuffle and then update the regressor file
for i in {0..49}
do
    <some script: train.txt --> shuffled_data.txt>
    vw -d shuffled_data.txt -i train.model -f train.model
done

If VW doesn't automatically shuffle, then is there a more efficient way of performing the above code block? VW's wiki is unfortunately unclear with regards to this. Thanks.

like image 979
richizy Avatar asked Jan 06 '14 00:01

richizy


1 Answers

No, it doesn't shuffle. I'd bet it's not worth shuffling the data either. Shuffling is very I/O intensive. While it might be better to do two passes with different shuffle order than two passes without shuffling, in terms of convergence, it's probably as costly as 10 passes without shuffling.

like image 69
Rob Neuhaus Avatar answered Sep 24 '22 00:09

Rob Neuhaus