Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Protobuf-net to stream large data files as IEnumerable

I'm trying to use Protobuf-net to save and load data to disk but got stuck.

I have a portfolio of assets that I need to process, and I want to be able to do that as fast as possible. I can already read from a CSV but it would be faster to use a binary file, so I'm looking into Protobuf-Net.

I can't fit all assets into memory so I want to stream them, not load them all into memory.

So what I need to do is expose a large set of records as an IEnumerable. Is this possible with Protobuf-Net? I've tried a couple of things but haven't been able to get it running.

Serializing seems to work, but I haven't been able to read them back in again, I get 0 assets back. Could someone point me in the right direction please? Looked at the methods in the Serializer class but can't find any that covers this case. I this use-case supported by Protobuf-net? I'm using V2 by the way.

Thanks in advance,

Gert-Jan

Here's some sample code I tried:

public partial class MainWindow : Window {

    // Generate x Assets
    IEnumerable<Asset> GenerateAssets(int Count) {
        var rnd = new Random();
        for (int i = 1; i < Count; i++) {
            yield return new Asset {
                ID = i,
                EAD = i * 12345,
                LGD = (float)rnd.NextDouble(),
                PD = (float)rnd.NextDouble()
            };
        }
    }

    // write assets to file
    private void Write(string path, IEnumerable<Asset> assets){
        using (var file = File.Create(path)) {
            Serializer.Serialize<IEnumerable<Asset>>(file, assets);
        }
    }

    // read assets from file
    IEnumerable<Asset> Read(string path) {
        using (var file = File.OpenRead(path)) {
            return Serializer.DeserializeItems<Asset>(file, PrefixStyle.None, -1);
        }
    }

    // try it 
    private void Test() {
        Write("Data.bin", GenerateAssets(100)); // this creates a file with binary gibberish that I assume are the assets
        var x = Read("Data.bin");
        MessageBox.Show(x.Count().ToString()); // returns 0 instead of 100
    }

    public MainWindow() {
        InitializeComponent();
    }

    private void button2_Click(object sender, RoutedEventArgs e) {
        Test();
    }
}

[ProtoContract]
class Asset {

    [ProtoMember(1)]
    public int ID { get; set; }

    [ProtoMember(2)]
    public double EAD { get; set; }

    [ProtoMember(3)]
    public float LGD { get; set; }

    [ProtoMember(4)]
    public float PD { get; set; }
}
like image 778
gjvdkamp Avatar asked Aug 26 '11 12:08

gjvdkamp


1 Answers

figured it out. To deserialize use PrefixBase.Base128 wich apparently is the default.

Now it works like a charm!

GJ

        using (var file = File.Create("Data.bin")) {
            Serializer.Serialize<IEnumerable<Asset>>(file, Generate(10));
        }

        using (var file = File.OpenRead("Data.bin")) {
            var ps = Serializer.DeserializeItems<Asset>(file, PrefixStyle.Base128, 1);
            int i = ps.Count(); // got them all back :-)
        }
like image 95
gjvdkamp Avatar answered Oct 15 '22 00:10

gjvdkamp