I have 2 deflate functions written in C# and Scala, when running with the same input, the returned byte array has a difference in leading bytes and trailing bytes (the difference between the bytes in the middle is expected by the unsigned/signed bytes mechanism between C# and Scala).
Deflate function in Scala:
import java.io.ByteArrayOutputStream
import java.util.zip.{Deflater, DeflaterOutputStream}
import zio._
object ZDeflater {
val deflater = ZManaged.makeEffectTotal(new Deflater(Deflater.DEFLATED, true))(_.end)
val buffer = ZManaged.fromAutoCloseable(ZIO.succeed(new ByteArrayOutputStream()))
val stream = for {
d <- deflater
b <- buffer
s <- ZManaged.fromAutoCloseable(ZIO.succeed(new DeflaterOutputStream(b, d, true)))
} yield (b, s)
def deflate(input: Array[Byte]): RIO[blocking.Blocking, Array[Byte]] = stream.use { case (buffer, stream) =>
for {
() <- blocking.effectBlocking(stream.write(input))
() <- blocking.effectBlocking(stream.flush())
result = buffer.toByteArray
} yield result
}
}
Deflate function in C#:
private static byte[] Deflate(byte[] uncompressedBytes)
{
using (var output = new MemoryStream())
{
using (var zip = new DeflateStream(output, CompressionMode.Compress, true))
{
zip.Write(uncompressedBytes, 0, uncompressedBytes.Length);
}
return output.ToArray();
}
}
Outputs after deflating: Scala:
ZDeflater.deflate(data.getBytes(StandardCharsets.UTF_8))
124, -111, …, 126, 1, 0, 0, -1, -1
C#:
Deflate(Encoding.UTF8.GetBytes(data))
125, 145, …, 126, 1
Does anyone know what causes the difference between the first and last bytes? Any of your assumptions are very helpful to me. Thank a bunch
P/s: We're having a problem with a situation where C#'s Deflate output works for a specific 3rd part and Scala's output doesn't. So I'm trying to figure out how to make Scala's output to be the same as C#'s
As documented here, Java's Deflater class deflates sequences of bytes into ZLIB compressed data format. The ZLIB data format wraps compressed data in DEFLATE data format with a header and an ADLER-32 checksum after the compressed data.
Microsoft's documentation for DeflateStream is inaccurate about the exact data format. But it actually poduces data in raw DEFLATE data format and not in ZLIB format (dotnet-2236 ). With it, its output is also incompatible with HTTP's "deflate" transfer encoding, which actually references the ZLIB data format and not the DEFLATE data format (RFC-2616).
But how can you achive now the same output with Scala and C#?
A) Write data also in raw DEFLATE format with Scala
The Deflater class has an overloaded construtor with
a nowrap parameter, that allows to omit the header and the checksum.
Setting this parameter to true will lead to compressed data in raw DEFLATE data format. If you should also plan to de-serialize the data in Java, please read the Javadoc of the Inflater constructors carefully.
B) Write data also in ZLIB format with C# (recommended)
Use .NET's ZLibStream class or any third-party library instead of the Deflater class, to serialize your data in ZLIB format.
C) Use GZIP format instead
GZIP format is comparable to ZLIB, but uses a different header and a different checksum. Both, .NET and Java provide explicit stream classes for it. Although ZLIB's checksum calculation performs better and produces even a smaller header than GZIP, the latter is more common (especially in the web). Main reason for GZIP's popularity is, that Microsoft always had trouble distinguishing between raw DEFLATE and ZLIB resp.HTTP's deflate transfer encoding (see ZLIB's FAQ-39;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With