Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String length when converting from a character array

I'm having serious problems with string-handling. As my problems are rather hard to describe, I will start with some demo code reproducing them:

Dim s1 As String = "hi"
Dim c(30) As Char
c(0) = "h"
c(1) = "i"
Dim s2 As String = CStr(c)
s2 = s2.Trim()
If not s1 = s2 Then
   MsgBox(s1 + " != " + s2 + Environment.NewLine + _
          "Anything here won't be printed anyway..." + Environment.NewLine + _ 
          "s1.length: " + s1.Length.ToString + Environment.NewLine + _
          "s2.length: " + s2.Length.ToString + Environment.NewLine)
End If                    

The result messagebox looks like this:

screenshot of the messagebox showing only hi != hi but not the rest of the text

The reason that this comparison fails is that s2 has the length 31 (from the original array-size) while s1 has the length 2.

I stumble over this kind of problem quite often when reading string-information out of byte-arrays, for example when handling ID3Tags from MP3s or other encoded (ASCII, UTF8, ...) information with pre-specified length.

Is there any fast and clean way to prevent this problem?

What is the easiest way to "trim" s2 to the string shown by the debugger?

like image 275
Janis Avatar asked Dec 26 '22 07:12

Janis


2 Answers

I changed the variable names for clarity:

Dim myChars(30) As Char
myChars(0) = "h"c           ' cannot convert string to char
myChars(1) = "i"c           ' under option strict (narrowing)
Dim myStrA As New String(myChars)
Dim myStrB As String = CStr(myChars)

The short answer is this:

Under the hood, strings are character arrays. The last 2 lines both create a string one using NET code, the other a VB function. The thing is that, although the array has 31 elements, only 2 were initialized:

enter image description here

The rest are null/Nothing, which for a Char means Chr(0) or NUL. Since NUL is used to mark the end of a String, only the characters up to that NUL will print in the Console, MessageBox etc. Text appended to the string will not display either.


Concepts

Since the strings above are created directly from a char array, the length is that of the original array. The Nul is a valid char so they get added to the string:

Console.WriteLine(myStrA.Length)     ' == 31

So, why doesn't Trim remove the nul characters? MSDN (and Intellisense) tells us:

[Trim] Removes all leading and trailing white-space characters from the current String object.

The trailing null/Chr(0) characters are not white-space like Tab, Lf, Cr or Space, but is a control character.

However, String.Trim has an overload which allows you to specify the characters to remove:

myStrA = myStrA.Trim(Convert.ToChar(0))
' using VB namespace constant
myStrA = myStrA.Trim( Microsoft.VisualBasic.ControlChars.NullChar)

You can specify multiple chars:

' nuls and spaces:
myStrA = myStrA.Trim(Convert.ToChar(0), " "c)

Strings can be indexed / iterated as a char array:

    For n As Int32 = 0 To myStrA.Length
        Console.Write("{0} is '{1}'", n, myStrA(n))  ' or myStrA.Chars(n)
    Next

0 is 'h'
1 is 'i'
2 is '

(The output window will not even print the trailing CRLF.) You cannot change the string's char array to change the string data however:

   myStrA(2) = "!"c

This will not compile because they are read-only.

See also:

ASCII table

like image 190
Ňɏssa Pøngjǣrdenlarp Avatar answered Jan 12 '23 18:01

Ňɏssa Pøngjǣrdenlarp


If you want to create strings from a byte array, i.e. ID3v2.4.0 with ISO-8859 encoding, then this should work:

    Dim s1 As String = "Test"
    Dim b() As Byte = New Byte() {84, 101, 115, 116, 0, 0, 0}
    Dim s2 As String = System.Text.ASCIIEncoding.ASCII.GetString(b).Trim(ControlChars.NullChar)

    If s1 = s2 Then Stop

According to this http://id3.org/id3v2.4.0-structure other encodings may be present and the code would need to be adjusted if one of the others is used.

like image 37
dbasnett Avatar answered Jan 12 '23 17:01

dbasnett