Custom Featureunion Won't Work?
I'm trying to modify this example to use a Pandas dataframe instead of the test datasets. I am not able to do so, as ItemSelector does not seem to recognise the column name. Please
Solution 1:
Yes, thats because LabelEncoder only requires a single array y whereas FeatureUnion will try sending X and y both to it.
See this: https://github.com/scikit-learn/scikit-learn/issues/3956
You can use a simple workaround for this:
Define a custom labelEncoder like this:
classMyLabelEncoder(BaseEstimator, TransformerMixin):
def__init__(self):
self.le = LabelEncoder()
deffit(self, x, y=None):
return self.le.fit(x)
deftransform(self, x, y=None):
return self.le.transform(x).reshape(-1,1)
deffit_transform(self, x, y=None):
self.fit(x)
return self.transform(x)
And in the pipeline, do this:
....
....
('selector', ItemSelector(key='u_category')),
('labelenc', MyLabelEncoder()),
Please note the reshape(-1,1) in the trasform()
method. Thats because FeatureUnion only works with 2-d data. All the individual transformers inside the FeatureUnion should only return 2-d data.
Solution 2:
you may need to add them in the features array like this , please try to add the two selectors in the features like this and show me the results
features = np.recarray(shape=(len(posts),),
dtype=[('u_category', object), ('rawtext', object)])
Post a Comment for "Custom Featureunion Won't Work?"