Skip to content Skip to sidebar Skip to footer

Passing A Pipe/connection As Context Arg To Multiprocessing Pool.apply_async()

I want to use pipes to talk to the process instances in my pool, but I'm getting an error: Let __p be an instance of Pool(): (master_pipe, worker_pipe) = Pipe() self.__p.a

Solution 1:

It looks like this bug (http://bugs.python.org/issue4892) noted in this discussion: Python 2.6 send connection object over Queue / Pipe / etc

The pool forks child processes initially with pipes for communicating tasks/results to/from the child processes. It's in communicating your Pipe object over the existing pipe that it blows up - not on the forking. (the failure is when the child process tries a get() on the queue abstraction).

It looks like the problem arises because of how the Pipe object is pickled/unpickled for communication.

In the second case that you noted, the pipe is passed to a process instance and then forked - thus the difference in behavior.

I can't imagine that actively communicating with pool processes outside of pure task distribution was an intended use case for multiprocessing pool though. State/protocol-wise, that would imply that you would want more control over the process. That would require more context than what the general Pool object could ever know.

Solution 2:

This is possible to solve by using the initializer and initargs arguments when you create the pool and its processes. Admittedly there has to be a global variable involved as well. However if you put the worker code in a separate module, it doesn't look all that bad. And it is only global to that process. :-)

A typical case is that you want your worker processes to add stuff to a multiprocessing queue. As that has to do with something having to reside in a certain spot in the memory, pickling will not work. Even if it would have worked, it would just have copied data about the fact that some process has a queue. Which is the opposite of what we want here. We want to share the same queue.

So here is a meta code example:

The module containing the worker code, we call it "worker_module":

defworker_init(_the_queue):
    global the_queue
    the_queue = _the_queue

defdo_work(_a_string):
    # Add something to the queue
    the_queue.put("the string " + _a_string)

And the creation of the pool, followed by having it doing something

# Import our functionsfrom worker_module import worker_init, do_work

# Good idea: Call it MPQueue to not confuse it with the other Queuefrom multiprocessing import Queue as MPQueue
from multiprocessing import Pool

the_queue = MPQueue() 
# Initialize workers, it is only during initialization we can pass the_queue
the_pool = Pool(processes= 3, initializer=worker_init, initargs=[the_queue,])
# Do the work
the_pool.apply(do_work, ["my string",])
# The string is now on the queue
my_string = the_queue.get(True))

Solution 3:

This is a bug which has been fixed in Python 3.

Easiest solution is to pass the queue through the Pool's initializer as suggested in the other answer.

Post a Comment for "Passing A Pipe/connection As Context Arg To Multiprocessing Pool.apply_async()"