Why Does String Concatenation Matter In Python?
Solution 1:
Because in general +
ing n
strings will result in n-1
allocations of O(n)
memory for the result. Concatenation via adjacency is done in the parser, and performs 1 allocation. Concatenation via for instance ''.join(iter(s))
will perform O(log(n))
allocations/copies of 2n
total memory.
> a = ['a'] * 100000> def concat(strings):
c = ''
for s in strings:
c += s
return c
> %timeit ''.join(a) # precalculates necessary buffer size
1000 loops, best of 3: 1.07 ms per loop
> %timeit ''.join(iter(a)) # allocates exponentially larger buffers
1000 loops, best of 3: 1.94 ms per loop
> %timeit concat(a) # allocates a new buffer n-1 times
100 loops, best of 3: 7.15 ms per loop
Solution 2:
Strings are immutable objects in Python, so you cannot modify existing strings. That means that every concatenation of a string results in a new string object being created and two (the source objects) being thrown away. Memory allocation is expensive enough to make this matter a lot.
So when you know you need to concatenate multiple strings, store them in a list. And then at the end, just once, join that list using ''.join(list_of_strings)
. That way, a new string will only be created once.
Note that this also applies in other languages. For example Java and C# both have a StringBuilder
type which is essentially the same. As long as you keep appending new string parts, they will just internally add that to a string, and only when you convert the builder into a real string, the concatenation happens—and again just once.
Also note, that this memory allocation overhead already happens when you just append a few strings in a single line. For example a + b + c + d
will create three intermediary strings. You can see that, if you look at the byte code for that expression:
>>> dis.dis('a + b + c + d')
10 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 BINARY_ADD
7 LOAD_NAME 2 (c)
10 BINARY_ADD
11 LOAD_NAME 3 (d)
14 BINARY_ADD
15 RETURN_VALUE
Each BINARY_ADD
concats the two previous values and creates a new object for the result on the stack. Note that for constant string literals, the compiler is smart enough to notice that you are adding constants:
>>> dis.dis('"foo" + "bar" + "baz"')
10 LOAD_CONST 4 ('foobarbaz')
3 RETURN_VALUE
If you do have some variable parts within that though—for example if you want to produce a nicely formatted output—then you are back to creating intermediary string objects. In that case, using str.format
is a good idea, e.g. 'foo {} bar {} baz'.format(a, b)
.
Post a Comment for "Why Does String Concatenation Matter In Python?"