I'm trying to use FeatureUnion
to extract different features from a datastructure, but it fails due to different dimensions: ValueError: blocks[0,:] has incompatible row dimensions
My FeatureUnion
is built the following way:
features = FeatureUnion([
('f1', Pipeline([
('get', GetItemTransformer('f1')),
('transform', vectorizer_f1)
])),
('f2', Pipeline([
('get', GetItemTransformer('f2')),
('transform', vectorizer_f1)
]))
])
GetItemTransformer
is used to get different parts of data out of the same structure. The Idea is described here in the scikit-learn issue-tracker.
The Structure itself is stored as {'f1': data_f1, 'f2': data_f2}
where data_f1
are different lists with different lengths.
Since the Y-Vector is different to the Data-Fields I assume that the error occurs, but how can I scale the vector to fit in both cases?
Here's what worked for me:
class ArrayCaster(BaseEstimator, TransformerMixin):
def fit(self, x, y=None):
return self
def transform(self, data):
print data.shape
print np.transpose(np.matrix(data)).shape
return np.transpose(np.matrix(data))
FeatureUnion([('text', Pipeline([
('selector', ItemSelector(key='text')),
('vect', CountVectorizer(ngram_range=(1,1), binary=True, min_df=3)),
('tfidf', TfidfTransformer())
])
),
('other data', Pipeline([
('selector', ItemSelector(key='has_foriegn_char')),
('caster', ArrayCaster())
])
)])
I don't know if this applies to your question, but we ran into the same error in a slightly different situation and just solved it.
Our f1
entries were each lists of 15 numeric values and we needed to do tf-idf on f2
. This generated the same error about incompatible row dimensions.
After running it through the debugger, we found that the shapes of our matrices were subtly different going into the hstack()
call in FeatureUnion
: (2569,)
and (2659, 706)
.
If we cast f1
to a 2D numpy array, the shape changed to (2659, 15)
and the hstack
call works.
The cast was something like this: f1 = np.array(list(f1))
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With