Given a Pandas dataframe df
, we can sum the columns like this
[x for x in df.sum()]
and produce the sum of sums like this.
sum([x for x in df.sum()])
Can this be done using only dataframe operations, without resorting to Python's sum()?
sum() function is used to return the sum of the values for the requested axis by the user. If the input value is an index axis, then it will add all the values in a column and works same for all the columns. It returns a series that contains the sum of all the values in each column.
sum() to Sum All Columns. Use DataFrame. sum() to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows.
Pandas dataframe.sum () function returns the sum of the values for the requested axis. Summing all the rows of a Dataframe using the sum function and setting the axis value to 1 for summing up the row values and displaying the result as output.
The extracted rows are called slices and contain all the columns. The easiest way to extract a single row is to use the row index inside the .iloc attribute. The general syntax is: The output is a Pandas Series which contains the row values. The appearance is a bit confusing as the output is a Pandas Series.
Summing all the rows of a Dataframe using the sum function and setting the axis value to 1 for summing up the row values and displaying the result as output. Summing all the rows or some rows of the Dataframe as per requirement using loc function and the sum function and setting the axis to 1 for summing up rows.
A Dataframe is a 2-dimensional data structure in form of a table with rows and columns. It can be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, an Excel file, or from a python list or dictionary as well. Pandas dataframe.sum () function returns the sum of the values for the requested axis.
We can do stack
df.stack().sum()
Use np.sum
:
np.sum(df.to_numpy())
or as @jakub points out:
df.to_numpy().sum()
Timings:
Using...
df = pd.DataFrame(np.arange(10000).reshape(100,-1))
%timeit df.to_numpy().sum()
# 12.1 µs ± 357 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.sum(df.to_numpy())
# 14 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.stack().sum()
# 469 µs ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.sum().sum()
# 381 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With