Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do strings contain empty substrings everywhere?

Tags:

c++

string

This question arises from a discussion originating on this answer.

In a nutshell: The author of the answer (0x499602D2) claimed (correctly, as I now know) that when not skipping whitespace, but the next character is a whitespace, all extracts with the exception of characters will fail.

I questioned this on the base that I thought that extracting a string should not fail, because the stream contained an empty string delimited by the whitespace character at the beginning.

This developed into the general discussion whether or not there's an empty string at any position in a string, e.g. in between the a and the b of the string "ab" (I say yes, 0x499602D2 says no). 0x499602D2 suggested that I put this in a question, so here I do.

I copy my main arguments for my position from that thread (including the chat part):

Let's first look at the constant for an empty string. In C and C++, the content is delimited by quotes at the beginning and end. So what does the empty string look like? You know it: "". You see, after the initial quote (delimiter) directly follows the final quote (delimiter). The empty string is in between the two quotes, which follow directly on each other, because the empty string has no characters. Also look at the C representation. That is the sequence of characters, followed by the delimiter '\0'. So what is the representation of the empty string? Well, the characters of the empty string followed by the delimiter. Which means, the first character is the delimiter (that is, exactly as in the stream case). Now consider the concatenation of strings, where e.g. the first string is "a", the second string is empty, and the third string is "b". So what is the concatenation? Well, "ab". So clearly there's an empty string between the a and the b in "ab" (we explicitly put it there!). And of course that is true also before the a and after the b. That is, there's an empty string (or two, or a million) between any two characters of a string.

An empty string has no characters, and between consecutive characters, there are no characters. Therefore between two characters there's an empty string. Also see the other arguments I've given before. In addition, consider regular expressions which match the empty string: They also match everywhere. For example, /ab*c/ matches "ac" because b* matches the empty string between a and c

There's an empty string (i.e, no characters) before the delimiter (space), just as in the C representation of the empty string, there are no characters before the \0 delimiter. Also note that readline also works the same with the \n delimiter: If the \n follows immediately, it doesn't fail but gives an empty string.

I feel unable to identify 0x499602D2's main arguments in the discussion, so I don't try in order to avoid being unintentionally unfair in the selection. You should be able to see them in the comments (and possibly in the chat room — I have no idea whether that is accessible by everyone). @0x499602D2: If you want, you can also yourself add your main arguments after this paragraph.

The practical question connected to this is: Should a well-designed string extraction function fail if there are no characters before the delimiter (as operator>> for strings does), or succeed and return an empty string (as readline does)?

like image 585
celtschk Avatar asked Mar 25 '14 22:03

celtschk


People also ask

Is empty string a substring of every string?

The empty string is always a substring of any string, even the empty string. There are two useful operations with substrings. You can ask if some string is a substring of another. You can also give a start and end index to extract a substring from a string.

What does an empty string contain?

An empty string is a string instance of zero length, whereas a null string has no value at all. An empty string is represented as "" . It is a character sequence of zero characters. A null string is represented by null .

Does every string contain empty string in Java?

An empty string occurs in every string.

Does the empty language contain the empty string?

Formal theory In formal treatments, the empty string is denoted with ε or sometimes Λ or λ. The empty string should not be confused with the empty language ∅, which is a formal language (i.e. a set of strings) that contains no strings, not even the empty string.


1 Answers

Theorem

There's an empty string ε at any position in a string s.

Proof

1. If |s| = 0 (s has length zero), then s = ε, and the claim holds.

2. If |s| > 0, then s has two edge positions: the one before its first symbol and the other after the last one. Since ε is the identity element of the concatenation operation, that is, εs = = s, the claim holds for both the start and the end positions.

3. If |s| > 1, then the s can be written as the concatenation of two non-empty strings: s = pq, where |p| > 0 and |q| > 0. Using the identity element property of ε, pεq = (pε)q = pq = s, which means that the claim holds for the position in s which divides it into the parts p and q. The position of this division can be any internal position of s, so the claim holds for every internal position too.

Corollary

The identity element property implies that ε = εε = εεε = etc. Repeating the above proof after replacing ε with ε^n, where n is a positive integer, we find that there is an infinite number of empty strings at any position in any string.

Notes

Here the word "position" means "caret position" (text insertion cursor position). The caret can be placed before the first symbol (index: 0), between consecutive symbols, and after the last symbol (index: |s|). The number of caret positions is |s| + 1.

The above proof shows that these "zero-width gaps" between symbols can be imagined as being filled with an arbitrary number of empty strings. (This is as strange as that the empty set is a subset of every set, including itself.)

like image 165
kol Avatar answered Sep 29 '22 01:09

kol