Assign values to SparseArray in Pandas?

Tags:

I have an object of type SparseDataFrame and I want to change some values.

Usually when working with dataframes I use DataFrame.loc, DataFrame.iloc or set_value. When trying to use this methods on a SparseDataFrame object I always get an error as following:

Click to copy

"SparseArray does not support item assignment via setitem"
TypeError: SparseArray does not support item assignment via setitem

How do I work with a SparseArray correctly?

This question: Set percentage of column to 0 (pandas) suggests to use first df.to_dense(), do the assignment and then use df.to_sparse() to convert it back. I wonder if there is a way to work directly with the SparseDataFrame / SparseArray?

484

asked Feb 28 '18 15:02

jkortner

1 Answers

It is frustrating to not be able to insert directly in sparse format with .loc[]. I'm afraid I only have a workaround.

Since the original posting of the question (and version 0.25) pandas has deprecated SparseDataFrame. Instead, it created a data type (SparseDtype) that can be applied to individual series within the DataFrame. In other words, it is no longer "all or nothing". You can:

convert a few columns in your DataFrame to dense format while keeping the others sparse,
insert your data with .loc[] in the dense columns,
and then convert these columns back to sparse.

This is obviously a lot less memory intensive than converting the entire DataFrame to dense.

Here is a very simple function to illustrate what I mean:

Click to copy

def sp_loc(df, index, columns, val):
    """ Insert data in a DataFrame with SparseDtype format

    Only applicable for pandas version > 0.25

    Args
    ----
    df : DataFrame with series formatted with pd.SparseDtype
    index: str, or list, or slice object
        Same as one would use as first argument of .loc[]
    columns: str, list, or slice
        Same one would normally use as second argument of .loc[]
    val: insert values

    Returns
    -------
    df: DataFrame
        Modified DataFrame

    """

    # Save the original sparse format for reuse later
    spdtypes = df.dtypes[columns]

    # Convert concerned Series to dense format
    df[columns] = df[columns].sparse.to_dense()

    # Do a normal insertion with .loc[]
    df.loc[index, columns] = val

    # Back to the original sparse format
    df[columns] = df[columns].astype(spdtypes)

    return df

Simple usage example:

Click to copy

# DÉFINITION DATAFRAME SPARSE

df1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])
df1.loc['a', 'J'] = 0.42
df1 = df1.astype(pd.SparseDtype(float))
#     |   I |      J
# ----+-----+--------
# a   | nan |   0.42
# b   | nan | nan
# c   | nan | nan

df1.dtypes
#I    Sparse[float64, nan]
#J    Sparse[float64, nan]

df1.sparse.density
# 0.16666666666666666

# INSERTION

df1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])
#     |   I |      J
# ----+-----+--------
#  a  |  -1 |   0.42
#  b  |   1 | nan
#  c  | nan | nan

df1.sparse.density()
# 0.5

117

answered Sep 30 '22 01:09

billjoie

Related questions
                            
                                PyCharm PEP8 Code Style highlights not working
                            
                                frequency axis in continuous wavelet transform plot (scaleogram) in python
                            
                                Python multiprocessing queue get() timeout despite full queue
                            
                                python KDE get contours and paths into specific json format leaflet-friendly
                            
                                Boost python getter/setter with the same name
                            
                                Auto-sklearn installation error
                            
                                What is a faster way to get the location of unique rows in numpy
                            
                                Python selenium send_keys emoji support
                            
                                Bokeh Interactive legend hide multiple glyphs
                            
                                How do I achieve sprintf-style formatting for bytes objects in python 3?
                            
                                Compact but pretty JSON output in python?
                            
                                How to extract text from a Specific Area in a PDF using Python?
                            
                                Why GridSearchCV in scikit-learn spawn so many threads
                            
                                Asynchronous GPU memory transfer with cupy
                            
                                OSError: [WinError 6] The handle is invalid when calling subprocess from Python 3.6
                            
                                Python write to hdfs file
                            
                                Recurrentshop and Keras: multi-dimensional RNN results in a dimensions mismatch error
                            
                                Using ROIPooling layer with a pretrained ResNet34 model in MxNet-Gluon
                            
                                How to bundle Python for AWS Lambda
                            
                                Running nested functions using numba

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Assign values to SparseArray in Pandas?

Tags:

python

pandas

sparse-matrix

jkortner

People also ask

1 Answers

billjoie

Recent Activity

Donate For Us