Passing A Pipe/connection As Context Arg To Multiprocessing Pool.apply_async()
Solution 1:
It looks like this bug (http://bugs.python.org/issue4892) noted in this discussion: Python 2.6 send connection object over Queue / Pipe / etc
The pool forks child processes initially with pipes for communicating tasks/results to/from the child processes. It's in communicating your Pipe object over the existing pipe that it blows up - not on the forking. (the failure is when the child process tries a get() on the queue abstraction).
It looks like the problem arises because of how the Pipe object is pickled/unpickled for communication.
In the second case that you noted, the pipe is passed to a process instance and then forked - thus the difference in behavior.
I can't imagine that actively communicating with pool processes outside of pure task distribution was an intended use case for multiprocessing pool though. State/protocol-wise, that would imply that you would want more control over the process. That would require more context than what the general Pool object could ever know.
Solution 2:
This is possible to solve by using the initializer and initargs arguments when you create the pool and its processes. Admittedly there has to be a global variable involved as well. However if you put the worker code in a separate module, it doesn't look all that bad. And it is only global to that process. :-)
A typical case is that you want your worker processes to add stuff to a multiprocessing queue. As that has to do with something having to reside in a certain spot in the memory, pickling will not work. Even if it would have worked, it would just have copied data about the fact that some process has a queue. Which is the opposite of what we want here. We want to share the same queue.
So here is a meta code example:
The module containing the worker code, we call it "worker_module":
defworker_init(_the_queue):
global the_queue
the_queue = _the_queue
defdo_work(_a_string):
# Add something to the queue
the_queue.put("the string " + _a_string)
And the creation of the pool, followed by having it doing something
# Import our functionsfrom worker_module import worker_init, do_work
# Good idea: Call it MPQueue to not confuse it with the other Queuefrom multiprocessing import Queue as MPQueue
from multiprocessing import Pool
the_queue = MPQueue()
# Initialize workers, it is only during initialization we can pass the_queue
the_pool = Pool(processes= 3, initializer=worker_init, initargs=[the_queue,])
# Do the work
the_pool.apply(do_work, ["my string",])
# The string is now on the queue
my_string = the_queue.get(True))
Solution 3:
This is a bug which has been fixed in Python 3.
Easiest solution is to pass the queue through the Pool's initializer as suggested in the other answer.
Post a Comment for "Passing A Pipe/connection As Context Arg To Multiprocessing Pool.apply_async()"