How can i convert a json file from a mongo db source to a Parquet file using C#?
I have found a library called Parquet.Net but i need something more dynamic. The data i have it is very dynamic and it is dificult to build a schema on that, if you have a solution to this problema please let me know.
  var file = File.ReadAllLines(@"C:\Users\NodeJS\Downloads\countries.json");
            List<object> tt = new List<object>();
            var fields = new HashSet<DataField>();
            foreach (var item in file)
            {
                var entity = JsonConvert.DeserializeObject<JObject>(item).ToObject<Dictionary<string, object>>();
                 foreach(var t in entity)
                {
                    fields.Add(new DataField(t.Key, t.Value.GetType()));
                        tt.Add(t.Value);
                }
            }
            var schema = new Schema(fields);
            using (Stream fileStream = System.IO.File.Create("convertJson.parquet"))
            {
                ParquetConvert.Serialize(tt, fileStream,schema);
            }
You could consider looking into Cinchoo ETL - an open source library, which can convert JSON to Parquet file.
Install Nuget package
install-package ChoETL.Parquet
Sample code
using ChoETL;
using (var r = new ChoJSONReader("*** Your JSON file ***"))
{
    using (var w = new ChoParquetWriter("*** Your parquet output file ***"))
    {
        w.Write(x);
    }
}
For more information, please visit codeproject article.
Sample fiddle: https://dotnetfiddle.net/fIJIfM
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With