Skip to content Skip to sidebar Skip to footer

Wrapping Pyspark Pipeline.__init__ And Decorators

I am trying to wrap the constructor for pyspark Pipeline.init constructor, and monkey patch in the newly wrapped constructor. However, I am running into an error that seems to have

Solution 1:

Decorator 101. Decorator is a higher-order function which takes a function as its first argument (and typically only), and returns a function. @ annotation is just a syntactic sugar for a simple function call, so following

@decoratordefdecorated(x):
    ...

can be rewritten for example as:

def decorated_(x):
    ...

decorated  =decorator(decorated_)

So Pipeline.__init__ is actually a functools.wrappedwrapper which captures defined __init__ (func argument of the keyword_only) as a part of its closure. When it is called, it uses received kwargs as a function attribute of itself. Basically what happens here can be simplified to:

deff(**kwargs):
    f._input_kwargs = kwargs  # f is in the current scopehasattr(f, "_input_kwargs")
False
f(foo=1, bar="x")

hasattr(f, "_input_kwargs")
True

When you further wrap (decorate) __init__ the external function won't have _input_kwargs attached, hence the error. If you want to make it work you have apply the same process, as used by the original __init__, to your own version, for example with the same decorator:

@keyword_onlydefnewInit(self, **keywordArgs):
    oldInit(self, stages=keywordArgs["stages"])

but I liked I mentioned in the comments, you should rather consider subclassing.

Post a Comment for "Wrapping Pyspark Pipeline.__init__ And Decorators"