Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deserialize huge JSON members

I'm calling an API that returns its responses as JSON objects. One of the members of the JSON objects can have a really long (10MiB to 3GiB+) base-64 encoded value. For example:

{
    "name0": "value0",
    "name1": "value1",
    "data": "(very very long base-64 value here)",
    "name2": "value2"
}

I need the data and the other names/values from the body. How do I get this data?

I'm currently using Newtonsoft.Json to (de)serialize JSON data in this application, and for smaller chunks of data, I would usually have a Data property of type byte[], but this data can be more than 2GiB and even if it's smaller than that, there may be so many responses coming back that we could run out of memory.

I'm hoping there is a way to write a custom JsonConverter or something to serialize/deserialize the data gradually as a System.IO.Stream, but I'm not sure how to read a single string "token" that cannot itself fit into memory. Any suggestions?

like image 257
Sean Killian Avatar asked Oct 19 '25 14:10

Sean Killian


1 Answers

A 3GiB+ string value is too large to fit in a .NET string, as it will exceed the maximum .NET string length. Thus you cannot use Json.NET to read your JSON response because Json.NET's JsonTextReader will always fully materialize property values as it reads, even when skipping then.

As for deserializing to a Stream or byte [] array, as noted in comments by Panagiotis Kanavos

Neither JSON.NET's JsonTextReader nor System.Text.Json's Utf8JsonReader have a method that retrieves a node as a stream. All the byte-related methods return the entire content at once.

Thus for sufficiently large data values you will exceed the maximum .NET array length.

So what are your options?

Firstly, I would encourage you to try to change the response format. JSON isn't an ideal format for huge Base64-encoded property values as, in general, most JSON serializers will fully materialize each property. Instead as suggested by Panagiotis Kanavos, send the binary data in the response body and the remaining properties as custom headers. Or see HTTP response with both binary data and JSON for additional options. If you do that you will be able to copy directly from the response body stream to some intermediate stream.

Secondly, you could attempt to generalize the code from this answer by mtosh to Parsing a JSON file with .NET core 3.0/System.text.Json. That answer shows how to iterate through a stream token-by-token using Utf8JsonReader from System.Text.Json. You could attempt to rewrite that answer to support reading of individual string values incrementally -- however I must admit that I do not know whether Utf8JsonReader actually supports reading portions of a property value in chunks without loading the entire value. As such, I can't recommend this approach.

Thirdly, you could adopt the approach from this answer to JsonConvert Deserialize Object out of memory exception and use the reader returned by JsonReaderWriterFactory.CreateJsonReader() to manually parse your JSON. This factory returns an XmlDictionaryReader that transcodes from JSON to XML on the fly, and thus supports incremental reading of Base64 properties via XmlReader.ReadContentAsBase64(Byte[], Int32, Int32). This is the reader used by WCF's DataContractJsonSerializer which is not recommended for new development, but has been ported to .NET Core, so can be used when no other options present themselves.

So, how would this work? First define a model corresponding to your JSON as follows, with your Data property represented as a Stream:

public partial class Model : IDisposable
{
    Stream data;

    public string Name0 { get; set; }
    public string Name1 { get; set; }
    [System.Text.Json.Serialization.JsonIgnore] // Added for debugging purposes
    public Stream Data { get => data; set => this.data = value; }
    public string Name2 { get; set; }
    
    public virtual void Dispose() => Interlocked.Exchange(ref data, null)?.Dispose();
}

Next, define the following extension methods:

public class JsonReaderWriterExtensions
{
    const int BufferSize = 8192;
    private static readonly Microsoft.IO.RecyclableMemoryStreamManager manager = new ();

    public static Stream CreateTemporaryStream() => 
        // Create some temporary stream to hold the deserialized binary data.  
        // Could be a FileStream created with FileOptions.DeleteOnClose or a Microsoft.IO.RecyclableMemoryStream
        // File.Create(Path.GetTempFileName(), BufferSize, FileOptions.DeleteOnClose);
        manager.GetStream();
    
    public static T DeserializeModelWithStreams<T>(Stream inputStream) where T : new() =>
        PopulateModelWithStreams(inputStream, new T());

    public static T PopulateModelWithStreams<T>(Stream inputStream, T model)
    {
        ArgumentNullException.ThrowIfNull(inputStream);
        ArgumentNullException.ThrowIfNull(model);

        var type = model.GetType();
        
        using (var reader = JsonReaderWriterFactory.CreateJsonReader(inputStream, XmlDictionaryReaderQuotas.Max))
        {
            // TODO: Stream-valued properties not at the root level.
            if (reader.MoveToContent() != XmlNodeType.Element)
                throw new XmlException();
            while (reader.Read() && reader.NodeType != XmlNodeType.EndElement)
            {
                switch (reader.NodeType)
                {
                    case XmlNodeType.Element:
                        var name = reader.LocalName;
                        // TODO:
                        // Here we could use use DataMemberAttribute.Name or other attributes to build a contract mapping the type to the JSON.
                        var property = type.GetProperty(name, BindingFlags.IgnoreCase | BindingFlags.Public | BindingFlags.Instance);
                        if (property == null || !property.CanWrite || property.GetIndexParameters().Length > 0 || Attribute.IsDefined(property, typeof(IgnoreDataMemberAttribute)))
                            continue;
                        // Deserialize the value
                        using (var subReader = reader.ReadSubtree())
                        {
                            subReader.MoveToContent();
                            if (typeof(Stream).IsAssignableFrom(property.PropertyType))
                            {
                                var streamValue = CreateTemporaryStream();  
                                byte[] buffer = new byte[BufferSize];
                                int readBytes = 0;
                                while ((readBytes = subReader.ReadElementContentAsBase64(buffer, 0, buffer.Length)) > 0)
                                    streamValue.Write(buffer, 0, readBytes);
                                if (streamValue.CanSeek)
                                    streamValue.Position = 0;
                                property.SetValue(model, streamValue);
                            }
                            else
                            {
                                var settings = new DataContractJsonSerializerSettings
                                {
                                    RootName = name,
                                    // Modify other settings as required e.g. DateTimeFormat.
                                };
                                var serializer = new DataContractJsonSerializer(property.PropertyType, settings);
                                var value = serializer.ReadObject(subReader);
                                if (value != null)
                                    property.SetValue(model, value);
                            }
                        }
                        Debug.Assert(reader.NodeType == XmlNodeType.EndElement);
                        break;
                    default:
                        reader.Skip();
                        break;
                }
            }
        }

        return model;
    }
}

And now you could deserialize your model as follows:

using var model = JsonReaderWriterExtensions.DeserializeModelWithStreams<Model>(responseStream);

Notes:

  1. Since the value of data may be arbitrarily large, you cannot deserialize its contents into a MemoryStream. Alternatives include:

    • A temporary FileStream e.g. as returned by File.Create(Path.GetTempFileName(), BufferSize, FileOptions.DeleteOnClose).
    • A RecyclableMemoryStream as returned by MSFT's Microsoft.IO.RecyclableMemoryStream nuget package.

    The demo code above uses RecyclableMemoryStream but you could change it to use a FileStream if you prefer. Either way you will need to dispose of it after you are done.

  2. I am using reflection to bind c# properties to JSON properties by name, ignoring case. For properties whose value type is not a Stream, I am using DataContractJsonSerializer to deserialize their values. This serializer has many quirks such as a funky default DateTime format so you may need to play around with your DataContractJsonSerializerSettings, or deserialize certain properties manually.

  3. My method JsonReaderWriterExtensions.DeserializeModelWithStreams() only supports Stream-valued properties at the root level. If you have nested huge Base64-valued properites you will need to rewrite JsonReaderWriterExtensions.PopulateModelWithStreams() to be recursive (which basically would amount to writing your own serializer).

  4. For a discussion of how the reader returned by JsonReaderWriterFactory transcodes from JSON to XML, see Efficiently replacing properties of a large JSON using System.Text.Json and Mapping Between JSON and XML.

Demo fiddle here.

like image 189
dbc Avatar answered Oct 22 '25 05:10

dbc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!