Airflow Can't Pickle _thread._local Objects
Solution 1:
if there's a good way to get around this so I can use the Airflow webserver effectively. I need to pass the python callable something that would allow it to interact with the database
Passing the connection string is a possibility, however it should not include the credentials (user id, password), as you don't want credentials to be stored in plain format. Airflow provides two concepts, Variables and Connections for this purpose, see this answer for details.
it is easier to make the engine once in the DAG and pass it to all of the Operators than make the engine in each one
Actually - no. It may seem easiser at first glance, however it is a bad idea on closer examination.
A database connection by its very nature is ephemeral and only exists for the time used by a particular process. Airflow tasks are instantiated at the time of execution (which may be much later, repeatedly), in a different process, possibly on a different machine. Hence even if you could pickle the connection it would not be of use to the task when it is run as it most likely would have seized to exist anyway.
In general and as a matter of principle, not only in Airflow, connections should always be created, managed and closed by the same process.
Post a Comment for "Airflow Can't Pickle _thread._local Objects"