Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteration over columns and rows in Pandas Dataframe

Say I have a dataframe that looks like:

d = {'option1': ['1', '0', '1', '1'], 'option2': ['0', '0', '1', '0'], 'option3': ['1', '1', '0', '0'], 'views': ['6', '10', '5', '2']
df = pd.DataFrame(data=d)

print(df)

  option1 option2 option3 views
0       1       0       1     6
1       0       0       1    10
2       1       1       0     5
3       1       0       0     2

I'm trying to build a for loop that iterates over each column (except the column "views") and each row. If the value of a cell is not 0, I want to replace it with the corresponding value of the column "views" from the same row.

The following output is required (should be easier to understand):

  option1 option2 option3 views
0       6       0       6     6
1       0       0      10    10
2       5       5       0     5
3       2       0       0     2

I tried something like:

df_range = len(df)

for column in df:
    for i in range(df_range):
        if column != 0:
            column = df.views[i]

But I know I'm missing something, it does not work.

Also please note that in my real dataframe, I have dozens of columns, so I need something that iterates over each column automatically. Thanks!!

I saw this thread Update a dataframe in pandas while iterating row by row but it doesn't exactly apply to my problem, because I'm not only going row by row, I also need to go column by column.

like image 802
Notna Avatar asked Feb 23 '18 15:02

Notna


People also ask

How do I iterate over a Pandas DataFrame column?

Iterate Over DataFrame Columns One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

Which function is used to iterate over all columns in the DataFrame?

iteritems(): Dataframe class provides a member function iteritems() which gives an iterator that can be utilized to iterate over all the columns of a data frame.


1 Answers

You can also achieve the result you want this way:

for col in df:
    if col == 'views':
        continue
    for i, row_value in df[col].iteritems():
        df[col][i] = row_value * df['views'][i]

Notice the following about this solution:

1) This solution operates on each value in the dataframe individually and so is less efficient than broadcasting, because it's performing two loops (one outer, one inner).

2) This solution assumes that option1...option N are binary because essentially this solution is multiplying each binary value in option1...option N with the values in views.

3) This solution will work for any number of option columns. The option columns may have any labels you desire.

4) This solution assumes there is a column labeled views.

like image 125
Keith Dowd Avatar answered Nov 10 '22 01:11

Keith Dowd