I have a dataframe in python pandas with several columns taken from a CSV file.
For instance, data =:
Day P1S1 P1S2 P1S3 P2S1 P2S2 P2S3 1 1 2 2 3 1 2 2 2 2 3 5 4 2
And what I need is to get the sum of all columns which name starts with P1... something like P1* with a wildcard.
Something like the following which gives an error:
P1Sum = data["P1*"]
Is there any why to do this with pandas?
sum() to Sum All Columns. Use DataFrame. sum() to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows.
sum() method is used to get the sum of the values for the requested axis. level[int or level name, default None] : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
Pandas: How to Sum Columns Based on a Condition You can use the following syntax to sum the values of a column in a pandas DataFrame based on a condition: df.loc[df ['col1'] == some_value, 'col2'].sum() This tutorial provides several examples of how to use this syntax in practice using the following pandas DataFrame:
Use DataFrame.sum () to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows. # Using DataFrame.sum () to Sum of each row df2 = df. sum ( axis =1) print( df2) Yields below output.
We can find the sum of the column titled “points” by using the following syntax: The sum () function will also exclude NA’s by default. For example, if we find the sum of the “rebounds” column, the first value of “NaN” will simply be excluded from the calculation:
First method: df = pd.read_csv (file_path, sep=' ') df ['History'] = df.loc [df [df.columns [pd.Series (df.columns).str.startswith ('History')]].sum (axes=1)]
I found the answer.
Using the data, dataframe from the question:
from pandas import * P1Channels = data.filter(regex="P1") P1Sum = P1Channels.sum(axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With