This question arises from a discussion originating on this answer. In a nutshell: The author of the answer (0x499602D2) claimed (correctly, as I now know) that when not skipping whitespace, but the next character is a whitespace, all extracts with the exception of characters will fail. I questioned this on the base that I thought that extracting a <code>string</code> should not fail, because the stream contained an empty string delimited by the whitespace character at the beginning. This developed into the general discussion whether or not there's an empty string at any position in a string, e.g. in between the <code>a</code> and the <code>b</code> of the string <code>"ab"</code> (I say yes, 0x499602D2 says no). 0x499602D2 suggested that I put this in a question, so here I do. I copy my main arguments for my position from that thread (including the chat part): <blockquote> Let's first look at the constant for an empty string. In C and C++, the content is delimited by quotes at the beginning and end. So what does the empty string look like? You know it: <code>""</code>. You see, after the initial quote (delimiter) directly follows the final quote (delimiter). The empty string is in between the two quotes, which follow directly on each other, because the empty string has no characters. Also look at the C representation. That is the sequence of characters, followed by the delimiter '\0'. So what is the representation of the empty string? Well, the characters of the empty string followed by the delimiter. Which means, the first character is the delimiter (that is, exactly as in the stream case). Now consider the concatenation of strings, where e.g. the first string is <code>"a"</code>, the second string is empty, and the third string is <code>"b"</code>. So what is the concatenation? Well, <code>"ab"</code>. So clearly there's an empty string between the <code>a</code> and the <code>b</code> in <code>"ab"</code> (we explicitly put it there!). And of course that is true also before the <code>a</code> and after the <code>b</code>. That is, there's an empty string (or two, or a million) between any two characters of a string. An empty string has no characters, and between consecutive characters, there are no characters. Therefore between two characters there's an empty string. Also see the other arguments I've given before. In addition, consider regular expressions which match the empty string: They also match everywhere. For example, <code>/ab*c/</code> matches <code>"ac"</code> because <code>b*</code> matches the empty string between <code>a</code> and <code>c</code> There's an empty string (i.e, no characters) before the delimiter (space), just as in the C representation of the empty string, there are no characters before the <code>\0</code> delimiter. Also note that <code>readline</code> also works the same with the <code>\n</code> delimiter: If the <code>\n</code> follows immediately, it doesn't fail but gives an empty string. </blockquote> I feel unable to identify 0x499602D2's main arguments in the discussion, so I don't try in order to avoid being unintentionally unfair in the selection. You should be able to see them in the comments (and possibly in the chat room — I have no idea whether that is accessible by everyone). @0x499602D2: If you want, you can also yourself add your main arguments after this paragraph. The practical question connected to this is: Should a well-designed string extraction function fail if there are no characters before the delimiter (as <code>operator>></code> for strings does), or succeed and return an empty string (as <code>readline</code> does)?

Theorem There's an empty string ε at any position in a string s. Proof 1. If |s| = 0 (s has length zero), then s = ε, and the claim holds. 2. If |s| > 0, then s has two edge positions: the one before its first symbol and the other after the last one. Since ε is the identity element of the concatenation operation, that is, εs = sε = s, the claim holds for both the start and the end positions. 3. If |s| > 1, then the s can be written as the concatenation of two non-empty strings: s = pq, where |p| > 0 and |q| > 0. Using the identity element property of ε, pεq = (pε)q = pq = s, which means that the claim holds for the position in s which divides it into the parts p and q. The position of this division can be any internal position of s, so the claim holds for every internal position too. Corollary The identity element property implies that ε = εε = εεε = etc. Repeating the above proof after replacing ε with ε^n, where n is a positive integer, we find that there is an infinite number of empty strings at any position in any string. Notes Here the word "position" means "caret position" (text insertion cursor position). The caret can be placed before the first symbol (index: 0), between consecutive symbols, and after the last symbol (index: |s|). The number of caret positions is |s| + 1. The above proof shows that these "zero-width gaps" between symbols can be imagined as being filled with an arbitrary number of empty strings. (This is as strange as that the empty set is a subset of every set, including itself.)

Do strings contain empty substrings everywhere?

Tags:

c++

string

This question arises from a discussion originating on this answer.

In a nutshell: The author of the answer (0x499602D2) claimed (correctly, as I now know) that when not skipping whitespace, but the next character is a whitespace, all extracts with the exception of characters will fail.

I questioned this on the base that I thought that extracting a string should not fail, because the stream contained an empty string delimited by the whitespace character at the beginning.

This developed into the general discussion whether or not there's an empty string at any position in a string, e.g. in between the a and the b of the string "ab" (I say yes, 0x499602D2 says no). 0x499602D2 suggested that I put this in a question, so here I do.

I copy my main arguments for my position from that thread (including the chat part):

Let's first look at the constant for an empty string. In C and C++, the content is delimited by quotes at the beginning and end. So what does the empty string look like? You know it: "". You see, after the initial quote (delimiter) directly follows the final quote (delimiter). The empty string is in between the two quotes, which follow directly on each other, because the empty string has no characters. Also look at the C representation. That is the sequence of characters, followed by the delimiter '\0'. So what is the representation of the empty string? Well, the characters of the empty string followed by the delimiter. Which means, the first character is the delimiter (that is, exactly as in the stream case). Now consider the concatenation of strings, where e.g. the first string is "a", the second string is empty, and the third string is "b". So what is the concatenation? Well, "ab". So clearly there's an empty string between the a and the b in "ab" (we explicitly put it there!). And of course that is true also before the a and after the b. That is, there's an empty string (or two, or a million) between any two characters of a string.

An empty string has no characters, and between consecutive characters, there are no characters. Therefore between two characters there's an empty string. Also see the other arguments I've given before. In addition, consider regular expressions which match the empty string: They also match everywhere. For example, /ab*c/ matches "ac" because b* matches the empty string between a and c

There's an empty string (i.e, no characters) before the delimiter (space), just as in the C representation of the empty string, there are no characters before the \0 delimiter. Also note that readline also works the same with the \n delimiter: If the \n follows immediately, it doesn't fail but gives an empty string.

I feel unable to identify 0x499602D2's main arguments in the discussion, so I don't try in order to avoid being unintentionally unfair in the selection. You should be able to see them in the comments (and possibly in the chat room — I have no idea whether that is accessible by everyone). @0x499602D2: If you want, you can also yourself add your main arguments after this paragraph.

The practical question connected to this is: Should a well-designed string extraction function fail if there are no characters before the delimiter (as operator>> for strings does), or succeed and return an empty string (as readline does)?

585

asked Mar 25 '14 22:03

celtschk

1 Answers

Theorem

There's an empty string ε at any position in a string s.

Proof

1. If |s| = 0 (s has length zero), then s = ε, and the claim holds.

2. If |s| > 0, then s has two edge positions: the one before its first symbol and the other after the last one. Since ε is the identity element of the concatenation operation, that is, εs = sε = s, the claim holds for both the start and the end positions.

3. If |s| > 1, then the s can be written as the concatenation of two non-empty strings: s = pq, where |p| > 0 and |q| > 0. Using the identity element property of ε, pεq = (pε)q = pq = s, which means that the claim holds for the position in s which divides it into the parts p and q. The position of this division can be any internal position of s, so the claim holds for every internal position too.

Corollary

The identity element property implies that ε = εε = εεε = etc. Repeating the above proof after replacing ε with ε^n, where n is a positive integer, we find that there is an infinite number of empty strings at any position in any string.

Notes

Here the word "position" means "caret position" (text insertion cursor position). The caret can be placed before the first symbol (index: 0), between consecutive symbols, and after the last symbol (index: |s|). The number of caret positions is |s| + 1.

The above proof shows that these "zero-width gaps" between symbols can be imagined as being filled with an arbitrary number of empty strings. (This is as strange as that the empty set is a subset of every set, including itself.)

165

answered Sep 29 '22 01:09

kol

Related questions
                            
                                Insert node at a certain position in a linked list C++
                            
                                Optimization of naive matrix multiplication (ICC vs GCC)
                            
                                How to use 'default' value within template metaprogramming
                            
                                C++ : Calling a child method from parent instantiation
                            
                                Diferences between pragmas simd and ivdep vector always?
                            
                                'std::thread::thread': no overloaded function takes 7 arguments
                            
                                Cannot convert from XXX** to const XXX** [duplicate]
                            
                                C++ template parameter deduction for std::array with non size_t integer
                            
                                how do you recalibrate touch events for a Qt application?
                            
                                Can I use a stackful coroutine as the wait handler of a steady_timer which is defined inside the very stackful coroutine?
                            
                                tilde operator returning -1, -2 instead of 0, 1 respectively
                            
                                C++ Instantiating a class, within a class. The correct way?
                            
                                qdbusxml2cpp unknown type
                            
                                How to convert Rcpp::List to std::vector<double>
                            
                                How to separate definition and declaration of child template class
                            
                                When to use const references over const value in function?
                            
                                C++ Why should I suppress the default copy constructor?
                            
                                Why does this compile with Visual Studio 2013 but not g++-4.8.1?
                            
                                Calculator to convert binary to float value -- what am I doing wrong?
                            
                                Why use bind instead of function call?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With