I have a one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizer to convert text data into numerical vectors. However, I am getting an error saying:
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
mealarray contains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:
vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)
The full stacktrace :
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
                Check the shape of mealarray.  If the argument to fit_transform is an array of strings, it must be a one-dimensional array. (That is, mealarray.shape must be of the form (n,).)  For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1).
You could try something like
data = vectorizer.fit_transform(mealarray.ravel())
                        Got the answer to my question. Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array. That solved my problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With