One of the first pieces of efficiency advice most .NET developers learn
is "use StringBuilder
to concatenate strings". A little bit like
"exceptions are expensive" this is a
misunderstood piece of received wisdom. (Fortunately it's not nearly as harmful
as the exception performance myth, but it comes up about as often...)
Before reading the rest of this page, you should be aware of
the basics of the String
type.
For the sake of readability, I'll stick to "string" rather than "String
"
or "string
" from here onwards. Please let me
know if you find this confusing, and I'll change it.
I've included this in my list of general framework articles rather than in the list of C#-specific articles because I suspect every language targetting .NET is likely to use the same way of concatenating strings under the hood.
There is a very real problem that this wisdom correctly addresses - that of building a load of strings (usually of increasing size) which are never used apart from contributing to the creation of other strings. Here's an example:
using System; public class Test { static void Main() { DateTime start = DateTime.Now; string x = ""; for (int i=0; i < 100000; i++) { x += "!"; } DateTime end = DateTime.Now; Console.WriteLine ("Time taken: {0}", end-start); } } |
On my (fairly fast) laptop, that takes nearly 10 seconds. Double the number of iterations,
and it takes over a minute. The results are slightly better on .NET 2.0 beta 2, but not
hugely. The problem is that strings are immutable - just because we're using "+=" here doesn't
mean the runtime actually appends to the end of the existing string. In fact, x += "!";
is absolutely equivalent to x = x+"!";
. The concatenation here is creating an entirely
new string, allocating enough memory for everything, copying all the data from the existing value
of x
and then copying the data from the string being appended ("!"). As the string
grows, the amount of data it has to copy each time grows too, which is why the time taken didn't
just double when I doubled the number of iterations.
This is clearly inefficient. If someone asked you to add something to a shopping list, you wouldn't
write a new copy of the shopping list first, would you? Enter StringBuilder
...
StringBuilder
Solution
Here is an equivalent (in terms of the final value of x
) program, which is much, much
faster:
using System; using System.Text; public class Test { static void Main() { DateTime start = DateTime.Now; StringBuilder builder = new StringBuilder(); for (int i=0; i < 100000; i++) { builder.Append("!"); } string x = builder.ToString(); DateTime end = DateTime.Now; Console.WriteLine ("Time taken: {0}", end-start); } } |
On my laptop, this executes too fast for the rudimentary timing mechanism I'm using to
give me any sensible results. Changing to a million iterations (i.e. ten times what the
first program took nearly ten seconds to do), it takes about 30-40ms. The time taken is
roughly linear in the number of iterations (i.e. double the iterations and it takes twice
as long). It does this by avoiding unnecessary copying - only the data we're actually appending
gets copied. StringBuilder
maintains an internal buffer and appends to that,
only copying its buffer when there isn't room for any more data. (In fact, the internal buffer
is just a string - strings are immutable from a public interface perspective, but not from
within the mscorlib
assembly.) We could make the above code even more efficient
by passing the final size of the string (which we happen to know in this case) to the
constructor of StringBuilder
to make it use a buffer of the right size to
start with - then there'd be no unnecessary copying at all. Unless you're in a situation
where you have that information readily to hand though, it's usually not worth worrying about -
StringBuilder
doubles its buffer size when it runs out of room, so it doesn't
end up copying the data very many times anyway.
StringBuilder
Everywhere, Right?
No, quite simply. The above is an explanation of why the received wisdom of "use
StringBuilder
for concatenation" is right some of the time. However,
many people take it at face value without understanding the reasoning behind it. They start
turning code like this:
string name = firstName + " " + lastName; Person person = new Person (name); |
into code like this:
// Bad code! Do not use! StringBuilder builder = new StringBuilder(); builder.Append (firstName); builder.Append (" "); builder.Append (lastName); string name = builder.ToString(); Person person = new Person (name); |
All that in the name of inefficiency. Now, on a broader point, even if the second version were more efficient than the first, it probably wouldn't be significantly more efficient - unless that code ended up being called vast, vast numbers of times, the amount of time spent in the concatenation is likely to be tiny. Making something less readable (and I think you'll agree the second version takes far longer to understand) for a small performance gain is a really bad idea in the first place.
However, the second version is actually less efficient than the first! Not much less
efficient - and if the second version were more readable, I'd go with it for the reasons above -
but when the entire point of using StringBuilder
is to improve efficiency,
using the above is just plain nuts.
The first version (assuming that firstName
and lastName
are real
variables, and not constants - I'll come onto that later) compiles to a call to
String.Concat
, like this:
string name = String.Concat (firstName, " ", lastName); Person person = new Person (name); |
String.Concat
takes a bunch of strings (or objects) and concatenates them together, plain and simple.
There are various overloads - some take strings, some take objects (which are just converted into strings), some
take arrays of objects or arrays of strings. They all do the same thing though. Now, String.Concat
can work out the lengths of all the strings involved before it concatenates them together (at least if you pass
it strings - if you pass it objects, it needs to create temporary strings and then concatenate those
together). This means that no extra copying is involved - the data is copied once into the new string, which is
of exactly the right length.
Compare this with the StringBuilder
version. It doesn't know at construction time how
big to make the buffer (because we haven't told it - doing so would make the code even less readable). That
means it may have to copy the buffer, and it's likely to end up with a buffer which is actually larger than
it needs to be. Oh, and there's the overhead of an extra object (the StringBuilder
itself). Remind
me why this was meant to be a good idea?
The important difference between this example and the previous one is that we can easily present all the
strings which need to be concatenated together in one call to String.Concat
. That means that no
intermediate strings are needed. StringBuilder
is efficient in the first example because it acts
as a container for the intermediate result without having to copy that result each time - when there's no
intermediate result anyway, it has no advantage.
Things get even crazier when it comes to string constants (literals, const string
members). What
do you suppose string x = "hello" + " " + "there";
is compiled to? It would be reasonable to expect it
to be another call to String.Concat
- but it isn't. It's actually compiled to the exact same code as
string x = "hello there";
. The compiler knows that all the parts are constant, so it does all the
concatenation at compile time, storing the full string in the compiled code. Converting that to use
StringBuilder
is inefficient in both memory and speed, as well as reducing readability.
So, when should you use StringBuilder
, and when should you use the string concatenation operators?
StringBuilder
when you're concatenating in a non-trivial loop - especially
if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example,
reading a file a character at a time, building up a string as you go using the +=
operator is
potentially performance suicide.
String.Concat
explicitly - or String.Join
if you need a delimiter.)
StringBuilder
isn't going to help you. For instance, if
you build up a full name from a first name and a last name, and then add a third piece of information (the
nickname, maybe) to the end, you'll only benefit from using StringBuilder
if you don't need
the (first name + last name) string for other purpose (as we do in the example which creates a Person
object).
Back to the main page.