Why Is Concatenating Strings Running Faster Than Joining Them?
Solution 1:
As I understand it "".join(iterable_of_strings) is the preferred way to concatenate strings because it allows for optimizations that avoid having to rewrite the immutable object to memory more times than necessary.
You understand somewhat incorrectly. "".join(iterable_of_strings)
is the preferred way to concatenate an iterable of strings, for the reason you explained.
However, you don't have an iterable of strings. You just have three strings. The fastest way to concatenate three strings is to add them together with +
, or use .format()
or %
. This is because you in your case have to first create the iterable, and then join the strings, all to avoid the copying of some quite small strings.
.join()
doesn't become faster until you have so many strings that it makes for stupid code to use the other methods anyway. When that happens depends on what Python implementation you have, what version and how long the strings are, but we are generally talking about more than ten strings.
Although it's true not all implementations have fast concatenation, I've tested both CPython, PyPy and Jython, and all of them have concatenation faster or as fast for just a couple of strings.
In essence, you should use choose between +
and .join()
depending on code clarity up until the time your code runs. Then, if you care about speed: Profile and benchmark your code. Don't sit and guess.
Some timings: http://slides.colliberty.com/DjangoConEU-2013/#/step-40
With video explanation: http://youtu.be/50OIO9ONmks?t=18m30s
Solution 2:
The time difference you're seeing comes from creating the list to be passed to join
. And while you can get a small speedup from using a tuple instead, it's still going to be slower than just concatenating with +
when there are only a few short strings.
It would be different if you had an iterable of strings to start with, rather than an object with strings as attributes. Then you could call join
directly on the iterable, rather than needing to build a new one for each call.
Here's some testing I did with the timeit
module:
import timeit
short_strings = ["foo", "bar", "baz"]
long_strings = [s*1000for s in short_strings]
defconcat(a, b, c):
return a + b + c
defconcat_from_list(lst):
return lst[0] + lst[1] + lst[2]
defjoin(a, b, c):
return"".join([a, b, c])
defjoin_tuple(a, b, c):
return"".join((a, b, c))
defjoin_from_list(lst):
return"".join(lst)
deftest():
print("Short strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*short_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(short_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*short_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*short_strings))))
print("{:20}{}\n".format("join_from_list:",
timeit.timeit(lambda: join_from_list(short_strings))))
print("Long Strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*long_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(long_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*long_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*long_strings))))
print("{:20}{}".format("join_from_list:",
timeit.timeit(lambda: join_from_list(long_strings))))
Output:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 292012, 10:57:17) [MSC v.160064 bit (AMD64)] on win32
Type "copyright", "credits"or"license()"for more information.
>>> ================================ RESTART ================================
>>>
>>> test()
Short strings
concat:0.5453461176251436concat_from_list:0.5185697357936024join:0.7099379456477868join_tuple:0.5900842397209949join_from_list:0.4177281794285359Long Strings
concat:2.002303591571888concat_from_list:1.8898819841869416join:1.5672863477837913join_tuple:1.4343144915087596join_from_list:1.231374639083505
So, joining from an already existing list is always fastest. Concatenating with +
is faster for individual items if they are short, but for long strings it is always slowest. I suspect the difference shown between concat
and concat_from_list
comes from the unpacking of the lists in the function call in the test code.
Post a Comment for "Why Is Concatenating Strings Running Faster Than Joining Them?"