I have a .NET app that is trying to ftp a file but I'm ending up with 1 extra byte per line. My line separator is Environment.NewLine, which I believe translates into \n\r. How many bytes is that?
However, those operating systems use a record-based file system, which stores text files as one record per line. In most file formats, no line terminators are actually stored. Operating systems for the CDC 6000 series defined a newline as two or more zero-valued six-bit characters at the end of a 60-bit word.
Understanding bits and bytes We call 8 bits a byte. The very common ASCII system makes each letter of the alphabet, both capital and small (plus punctuation and some other symbols) correspond to a number from 0 to 255 (for example a=97, b= 98 and so on), so one letter can be expressed with one byte.
A byte is the smallest unit of data on a system. In general, 1 byte = 1 ASCII character. 2 bytes = 1 UTF-16 character. An unsigned byte can old the values 0-255.
I know this is an old question, but for the sake of future readers; you can determine how many bytes are in a given string (or string value) via the following:
Encoding.UTF8.GetByteCount("SomeString");
In this case;
Encoding.Unicode.GetByteCount(Environment.NewLine);
// OR
Encoding.Unicode.GetByteCount("\n\r");
.NET uses Unicode
unless otherwise specified; for example with an XmlSerializer
you can specify the encoding.
Remember to use the proper encoding when you are attempting to count the number of bytes since it is different with each encoding:
- An ASCII character in 8-bit ASCII encoding is 8 bits (1 byte), though it can fit in 7 bits.
- An ISO-8895-1 character in ISO-8859-1 encoding is 8 bits (1 byte).
- A Unicode character in UTF-8 encoding is between 8 bits (1 byte) and 32 bits (4 bytes).
- A Unicode character in UTF-16 encoding is between 16 (2 bytes) and 32 bits (4 bytes), though most of the common characters take 16 bits. This is the encoding used by Windows internally.
- A Unicode character in UTF-32 encoding is always 32 bits (4 bytes).
- An ASCII character in UTF-8 is 8 bits (1 byte), and in UTF-16 - 16 bits.
- The additional (non-ASCII) characters in ISO-8895-1 (0xA0-0xFF) would take 16 bits in UTF-8 and UTF-16.
In ASCII encoding, \n is the Newline character 0x0A (decimal 10), \r is the Carriage Return character 0x0D (decimal 13).
As Jack has said already, the correct sequence is CR-LF, not vice versa.
FTP is probably adding LF characters to your stream if they are placed incorrectly and you are transmitting the file as Text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With