I'm doing some basic machine learning and have a sparse matrix resulting from TFIDF as follows:
<983x33599 sparse matrix of type '<type 'numpy.float64'>'
with 232944 stored elements in Compressed Sparse Row format>
Then I have a DataFrame with a title
column. I want to combine these into one DataFrame but when I try to use concat
, I get that I can't combine a DataFrame with a non-DataFrame object.
How do I get around this?
Thanks!
merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.
concat() function does all the heavy lifting of performing concatenation operations along with an axis od Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. Parameters: objs: Series or DataFrame objects. axis: axis to concatenate along; default = 0.
Use DataFrame. sparse. from_spmatrix() to create a DataFrame with sparse values from a sparse matrix.
Consider the following demo:
Source DF:
In [2]: df
Out[2]:
text
0 is it good movie
1 wooow is it very goode
2 bad movie
Solution: let's create a SparseDataFrame out of TFIDF sparse matrix:
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word', stop_words='english')
sdf = pd.SparseDataFrame(vect.fit_transform(df['text']),
columns=vect.get_feature_names(),
default_fill_value=0)
sdf['text'] = df['text']
Result:
In [13]: sdf
Out[13]:
bad good goode wooow text
0 0.0 1.0 0.000000 0.000000 is it good movie
1 0.0 0.0 0.707107 0.707107 wooow is it very goode
2 1.0 0.0 0.000000 0.000000 bad movie
In [14]: sdf.memory_usage()
Out[14]:
Index 80
bad 8
good 8
goode 8
wooow 8
text 24
dtype: int64
PS pay attention at .memory_usage()
- we didn't lose the "spareness". If we would use pd.concat
, join
, merge
, etc. - we would lose the "sparseness" as all these methods generate a new regular (not sparsed) copy of merged DataFrames
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With