Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining size of a future file while data is still in memory

Tags:

c#

This is C#/.NET 2.0.

So I have string that contains the future contents of an XML file. It contains metadata and binary data from image files. I would like to somehow determine how big the XML file will be once I write the data in the string to the file system.

I've tried the following and neither works:

Console.Out.WriteLine("Size: " + data.Length/1024 + "KB");

and

Console.Out.WriteLine("Size: " + (data.Length * sizeof(char))/1024 + "KB");

Neither works (the actual size of the resulting file deviates from what is returned from either of these methods). I'm obviously missing something here. Any help would be appreciated.

XML Serialization:

// doc is an XMLDocument that I've built previously
StringWriter sw = new StringWriter();
doc.Save(sw);
string XMLAsString = sw.ToString();

Writing to file system (XMLAsString passed to this function as variable named data):

Random rnd = new Random(DateTime.Now.Millisecond);      
FileStream fs = File.Open(@"C:\testout" + rnd.Next(1000).ToString() +  ".txt", FileMode.OpenOrCreate);
StreamWriter sw = new StreamWriter(fs);
app.Diagnostics.Write("Size of XML: " + (data.Length * sizeof(char))/1024 + "KB");
sw.Write(data);
sw.Close();
fs.Close();

Thanks

like image 929
cakeforcerberus Avatar asked Dec 17 '22 08:12

cakeforcerberus


1 Answers

You're missing how the encoding process works. Try this:

string data = "this is what I'm writing";
byte[] mybytes = System.Text.Encoding.UTF8.GetBytes(data);

The size of the array is exactly the number of bytes that it should take up on disk if it's being written in a somewhat "normal" way, as UTF8 is the default encoding for text output (I think). There may be an extra EOF (End Of File) character written, or not, but you should be really close with that.

Edit: I think it's worth it for everybody to remember that characters in C#/.NET are NOT one byte long, but two, and are unicode characters, that are then encoded to whatever the output format needs. That's why any approach with data.Length*sizeof(char) would not work.

like image 161
Kevin Anderson Avatar answered May 10 '23 10:05

Kevin Anderson