Wrapping Pyspark Pipeline.__init__ And Decorators
Solution 1:
Decorator 101. Decorator is a higher-order function which takes a function as its first argument (and typically only), and returns a function. @
annotation is just a syntactic sugar for a simple function call, so following
@decoratordefdecorated(x):
...
can be rewritten for example as:
def decorated_(x):
...
decorated =decorator(decorated_)
So Pipeline.__init__
is actually a functools.wrapped
wrapper
which captures defined __init__
(func
argument of the keyword_only
) as a part of its closure. When it is called, it uses received kwargs
as a function attribute of itself. Basically what happens here can be simplified to:
deff(**kwargs):
f._input_kwargs = kwargs # f is in the current scopehasattr(f, "_input_kwargs")
False
f(foo=1, bar="x")
hasattr(f, "_input_kwargs")
True
When you further wrap (decorate) __init__
the external function won't have _input_kwargs
attached, hence the error. If you want to make it work you have apply the same process, as used by the original __init__
, to your own version, for example with the same decorator:
@keyword_onlydefnewInit(self, **keywordArgs):
oldInit(self, stages=keywordArgs["stages"])
but I liked I mentioned in the comments, you should rather consider subclassing.
Post a Comment for "Wrapping Pyspark Pipeline.__init__ And Decorators"