Skip to content Skip to sidebar Skip to footer

Custom Featureunion Won't Work?

I'm trying to modify this example to use a Pandas dataframe instead of the test datasets. I am not able to do so, as ItemSelector does not seem to recognise the column name. Please

Solution 1:

Yes, thats because LabelEncoder only requires a single array y whereas FeatureUnion will try sending X and y both to it.

See this: https://github.com/scikit-learn/scikit-learn/issues/3956

You can use a simple workaround for this:

Define a custom labelEncoder like this:

classMyLabelEncoder(BaseEstimator, TransformerMixin):
    def__init__(self):
        self.le = LabelEncoder()

    deffit(self, x, y=None):
        return self.le.fit(x)

    deftransform(self, x, y=None):
        return self.le.transform(x).reshape(-1,1)

    deffit_transform(self, x, y=None):
        self.fit(x)
        return self.transform(x)

And in the pipeline, do this:

....
....
                ('selector', ItemSelector(key='u_category')),
                ('labelenc', MyLabelEncoder()),

Please note the reshape(-1,1) in the trasform() method. Thats because FeatureUnion only works with 2-d data. All the individual transformers inside the FeatureUnion should only return 2-d data.

Solution 2:

you may need to add them in the features array like this , please try to add the two selectors in the features like this and show me the results

features = np.recarray(shape=(len(posts),),
                               dtype=[('u_category', object), ('rawtext', object)])

Post a Comment for "Custom Featureunion Won't Work?"