Possible Duplicate:
Simple string concatenation
Yesterday, as I'm writing this, someone asked on SO
if i have a string
x='wow'
applying the functionadd
in python :x='wow' x.add(x) 'wowwow'
how can i do that in C++?
With add
(which is non-existent) corrected to __add__
(a standard
method) this is a deep and interesting question, involving both subtle low
level details, high level algorithm complexity considerations, and
even threading!, and yet it’s formulated in a very short and concise way.
I am reposting the original question as my own because I did not get a chance to provide a correct answer before it was deleted, and my efforts at having the original question revived, so that I could help to increase the general understanding of these issues, failed.
I have changed the original title “select python or C++” to …
thereby narrowing the question a little.
The given code snippet
x = 'wow'
x.__add__( x )
has different meanings in Python 2.x and Python 3.x.
In Python 2.x the strings are by default narrow strings, with one byte per encoding unit,
corresponding to C++ char
based strings.
In Python 3.x the strings are wide strings, guaranteed to represent Unicode,
corresponding to the in practice usage of C++
wchar_t
based strings, and likewise with an unspecified 2 or 4 bytes
per encoding unit.
Disregarding efficiency the __add__
method behaves the same in both main
Python versions, corresponding to the C++ +
operator for std::basic_string
(i.e., for std::string
and std::wstring
), e.g. quoting the CPython 3k
documentation:
object.__add__(self, other)
… to evaluate the expressionx + y
, wherex
is an instance of a class that has an__add__()
method,x.__add__(y)
is called.
So as an example, the CPython 2.7 code
x = 'wow'
y = x.__add__( x )
print y
would normally be written as
x = 'wow'
y = x + x
print y
and corresponds to this C++ code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
auto const x = string( "wow" );
auto const y = x + x;
cout << y << endl;
}
A main difference from the many incorrect answers given for the original question, is that the C++ correspondence is an expression, and not an update.
It is perhaps natural to think that the method name __add__
signifies a change
of the string object’s value, an update, but with respect to observable
behavior Python strings are immutable strings. Their values never change, as
far as can be directly observed in Python code. This is the same as in Java and
C#, but very unlike C++’s mutable std::basic_string
strings.
CPython 2.4 added the following optimization, for narrow strings only:
String concatenations in statements of the form
s = s + "abc"
ands += "abc"
are now performed more efficiently in certain circumstances. This optimization won't be present in other Python implementations such as Jython, so you shouldn't rely on it; using thejoin()
method of strings is still recommended when you want to efficiently glue a large number of strings together. (Contributed by Armin Rigo.)
It may not sound like much, but where it’s applicable this optimization reduces a sequence of concatenations from quadratic time O(n2) to linear time O(n), in the length n of the final result.
First of all the optimization replaces concatenations with updates, e.g. as if
x = x + a
x = x + b
x = x + c
or for that matter
x = x + a + b + c
was replaced with
x += a
x += b
x += c
In the general case there will be many references to the string object that x
refers to, and since Python string objects must appear to be immutable, the first
update assignment cannot change that string object. It therefore, generally, has
to create a completely new string object, and assign that (reference) to x
.
At this point x
holds the only reference to that object. Which means that the
object can be updated by the update assignment that appends b
, because there are
no observers. And likewise for the append of c
.
It’s a bit like quantum mechanics: you cannot observe this dirty thing going on, and it’s never done when there is the possibility of anyone observing the machinations, but you can infer that it must have been going on by the statistics that you collect about performance, for linear time is quite different from quadratic time!
How is linear time achieved? Well, with the update the same strategy of buffer
doubling as in C++ std::basic_string
can be done, which means that the
existing buffer contents only need to be copied at each buffer reallocation,
and not for each append operation. Which means that the
total cost of copying is at worst linear in the final string size, in the
same way as the sum (representing the costs of copying at each buffer doubling)
1 + 2 + 4 + 8 + … + N is less than 2*N.
In order to faithfully reproduce the CPython code snippet in C++,
the final result and expression-nature of the operation be should captured,
and also its performance characteristics should be captured!
A direct translation of CPython __add__
to C++ std::basic_string
+
fails
to reliably capture the CPython linear time. The C++ +
string concatenation
may be optimized by the compiler in the same way as the CPython
optimization. Or not – which then means that one has told a beginner that
the C++ equivalent of a Python linear time operation, is something with quadratic
time – hey, this is what you should use…
For the performance characteristics C++ +=
is the basic answer, but, this does
not catch the expression nature of the Python code.
The natural answer is a linear time C++ string builder class that translates
a concatenation expression to a series of +=
updates, so that the Python code
from __future__ import print_function
def foo( s ):
print( s )
a = 'alpha'
b = 'beta'
c = 'charlie'
foo( a + b + c ) # Expr-like linear time string building.
correspond to roughly
#include <string>
#include <sstream>
namespace my {
using std::string;
using std::ostringstream;
template< class Type >
string stringFrom( Type const& v )
{
ostringstream stream;
stream << v;
return stream.str();
}
class StringBuilder
{
private:
string s_;
template< class Type >
static string fastStringFrom( Type const& v )
{
return stringFrom( v );
}
static string const& fastStringFrom( string const& s )
{ return s; }
static char const* fastStringFrom( char const* const s )
{ return s; }
public:
template< class Type >
StringBuilder& operator<<( Type const& v )
{
s_ += fastStringFrom( v );
return *this;
}
string const& str() const { return s_; }
char const* cStr() const { return s_.c_str(); }
operator string const& () const { return str(); }
operator char const* () const { return cStr(); }
};
} // namespace my
#include <iostream>
using namespace std;
typedef my::StringBuilder S;
void foo( string const& s )
{
cout << s << endl;
}
int main()
{
string const a = "alpha";
string const b = "beta";
string const c = "charlie";
foo( S() << a << b << c ); // Expr-like linear time string building.
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With