Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign values to SparseArray in Pandas?

I have an object of type SparseDataFrame and I want to change some values.

Usually when working with dataframes I use DataFrame.loc, DataFrame.iloc or set_value. When trying to use this methods on a SparseDataFrame object I always get an error as following:

"SparseArray does not support item assignment via setitem"
TypeError: SparseArray does not support item assignment via setitem

How do I work with a SparseArray correctly?

This question: Set percentage of column to 0 (pandas) suggests to use first df.to_dense(), do the assignment and then use df.to_sparse() to convert it back. I wonder if there is a way to work directly with the SparseDataFrame / SparseArray?

like image 484
jkortner Avatar asked Feb 28 '18 15:02

jkortner


People also ask

How do I assign a value to pandas?

You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.

How do I assign a value to a column in pandas Dataframe?

Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.

How do you set a range in pandas?

You can expand the range for either the row index or column index to select more data. For example, you can select the first two rows of the first column using dataframe. iloc[0:2, 0:1] or the first columns of the first row using dataframe. iloc[0:1, 0:2] .


1 Answers

It is frustrating to not be able to insert directly in sparse format with .loc[]. I'm afraid I only have a workaround.

Since the original posting of the question (and version 0.25) pandas has deprecated SparseDataFrame. Instead, it created a data type (SparseDtype) that can be applied to individual series within the DataFrame. In other words, it is no longer "all or nothing". You can:

  • convert a few columns in your DataFrame to dense format while keeping the others sparse,
  • insert your data with .loc[] in the dense columns,
  • and then convert these columns back to sparse.

This is obviously a lot less memory intensive than converting the entire DataFrame to dense.

Here is a very simple function to illustrate what I mean:

def sp_loc(df, index, columns, val):
    """ Insert data in a DataFrame with SparseDtype format

    Only applicable for pandas version > 0.25

    Args
    ----
    df : DataFrame with series formatted with pd.SparseDtype
    index: str, or list, or slice object
        Same as one would use as first argument of .loc[]
    columns: str, list, or slice
        Same one would normally use as second argument of .loc[]
    val: insert values

    Returns
    -------
    df: DataFrame
        Modified DataFrame

    """

    # Save the original sparse format for reuse later
    spdtypes = df.dtypes[columns]

    # Convert concerned Series to dense format
    df[columns] = df[columns].sparse.to_dense()

    # Do a normal insertion with .loc[]
    df.loc[index, columns] = val

    # Back to the original sparse format
    df[columns] = df[columns].astype(spdtypes)

    return df

Simple usage example:

# DÉFINITION DATAFRAME SPARSE

df1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])
df1.loc['a', 'J'] = 0.42
df1 = df1.astype(pd.SparseDtype(float))
#     |   I |      J
# ----+-----+--------
# a   | nan |   0.42
# b   | nan | nan
# c   | nan | nan

df1.dtypes
#I    Sparse[float64, nan]
#J    Sparse[float64, nan]

df1.sparse.density
# 0.16666666666666666

# INSERTION

df1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])
#     |   I |      J
# ----+-----+--------
#  a  |  -1 |   0.42
#  b  |   1 | nan
#  c  | nan | nan

df1.sparse.density()
# 0.5
like image 117
billjoie Avatar answered Sep 30 '22 01:09

billjoie