performance tuning with std::string
So today I tried to optimize some code using std::string from the Standard Template Library and found something interesting.
Let's say you have strings to assign which sometimes get longer and then again shorter. To avoid unnecessary memory allocations you can use std::string::resize(size_t n); so you create the string and then resize it so that it is big enough for the longest string:
std::string s0; s0.resize(512); std::string s1="hello"; std::string s2=""; std::string s3="big"; std::string s4=""; std::string s5="world"; s0=s1; s0=s2; s0=s3; s0=s4; s0=s5;
So everything is handled as std::string and you might think everything's fine.
std::string s0; s0.resize(512); std::string s1="hello"; std::string s2=" "; std::string s3="big"; std::string s4=" "; std::string s5="world"; s0=s1.c_str(); s0=s2.c_str(); s0=s3.c_str(); s0=s4.c_str(); s0=s5.c_str();
Here everytime a standard C null-terminated string is assigned. You could think that this is slower, among others because the length is unknown and so during assignment either strlen() has to be called or every byte has to be checked for 0.
But, wrong !
filed a bug?
I doubt the standard says the capacity has to be dropped, so I guess http://gcc.gnu.org/bugs.html is your friend
Actually, since s0 = s1; is
Actually, since s0 = s1; is equivalent to s0 = new std::string(s1); (in other words, the copy constructor) and the copy constructor can be generated by the compiler as a bit-by-bit memory copy, my guess would be that the capacity does have to be dropped. Otherwise, it's not really a copy.
I wonder what gets checked by the == operator. If it checks capacity too, then dropping it is "correct", even if inconvenient.
The standard says, that after
that s1.capacity >= s2.capacity();
So obviously gcc follows the standard, and with gcc 3.x to 4.0.2 (SUSE 10.0) s1.capacity()==s2.capacity().
I also tested today with MS dot.net 2003 (or however this should be written), here the capacity isn't dropped, but stays high, so it should be faster (and also follows the standard).
The right way (TM)
Ok, so I had the pleasure to discuss this directly with the std::string maintainer.
operator=(const std::string&) throws away the allocated buffer, because it's a reference counting implementation, so in the case you don't append afterwards, it's really fast and lightweight.
operator=(const char*) is obviously not reference counting but does a deep copy instead, which preserves the buffer.
Instead of std::string::resize(size_t count); std::string::reserve(size_t count); should be used, since this is even faster, since the newly allocated buffer isn't initialized.
And less hackish than using operator=(const char*) would be to use
clear() and operator+=(const std::string&) instead of the direct assignement, since clear() will keep the buffer around.
I would think that std::string::assign() is much faster, but it's on par with string assignment. On my compiler, this is still faster then assigning a const char*. Specifying the string length speeds up the assigment. Using const strings does not speed up the code. std::string::assign() also changes the capacity.
As comparison on gcc 4.0.2 with -O3:
The last line is an assignment to char.
Can you kindly explain how you measured the time taken.
i am trying to check performance in my compiler and it would really help.