I have an object of type SparseDataFrame
and I want to change some values.
Usually when working with dataframes I use DataFrame.loc
, DataFrame.iloc
or set_value
. When trying to use this methods on a SparseDataFrame
object I always get an error as following:
"SparseArray does not support item assignment via setitem"
TypeError: SparseArray does not support item assignment via setitem
How do I work with a SparseArray correctly?
This question: Set percentage of column to 0 (pandas)
suggests to use first df.to_dense()
, do the assignment and then use df.to_sparse()
to convert it back.
I wonder if there is a way to work directly with the SparseDataFrame
/ SparseArray
?
You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.
Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
You can expand the range for either the row index or column index to select more data. For example, you can select the first two rows of the first column using dataframe. iloc[0:2, 0:1] or the first columns of the first row using dataframe. iloc[0:1, 0:2] .
It is frustrating to not be able to insert directly in sparse format with .loc[]. I'm afraid I only have a workaround.
Since the original posting of the question (and version 0.25) pandas has deprecated SparseDataFrame. Instead, it created a data type (SparseDtype) that can be applied to individual series within the DataFrame. In other words, it is no longer "all or nothing". You can:
This is obviously a lot less memory intensive than converting the entire DataFrame to dense.
Here is a very simple function to illustrate what I mean:
def sp_loc(df, index, columns, val):
""" Insert data in a DataFrame with SparseDtype format
Only applicable for pandas version > 0.25
Args
----
df : DataFrame with series formatted with pd.SparseDtype
index: str, or list, or slice object
Same as one would use as first argument of .loc[]
columns: str, list, or slice
Same one would normally use as second argument of .loc[]
val: insert values
Returns
-------
df: DataFrame
Modified DataFrame
"""
# Save the original sparse format for reuse later
spdtypes = df.dtypes[columns]
# Convert concerned Series to dense format
df[columns] = df[columns].sparse.to_dense()
# Do a normal insertion with .loc[]
df.loc[index, columns] = val
# Back to the original sparse format
df[columns] = df[columns].astype(spdtypes)
return df
Simple usage example:
# DÉFINITION DATAFRAME SPARSE
df1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])
df1.loc['a', 'J'] = 0.42
df1 = df1.astype(pd.SparseDtype(float))
# | I | J
# ----+-----+--------
# a | nan | 0.42
# b | nan | nan
# c | nan | nan
df1.dtypes
#I Sparse[float64, nan]
#J Sparse[float64, nan]
df1.sparse.density
# 0.16666666666666666
# INSERTION
df1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])
# | I | J
# ----+-----+--------
# a | -1 | 0.42
# b | 1 | nan
# c | nan | nan
df1.sparse.density()
# 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With