Say I have a dataframe that looks like:
d = {'option1': ['1', '0', '1', '1'], 'option2': ['0', '0', '1', '0'], 'option3': ['1', '1', '0', '0'], 'views': ['6', '10', '5', '2']
df = pd.DataFrame(data=d)
print(df)
option1 option2 option3 views
0 1 0 1 6
1 0 0 1 10
2 1 1 0 5
3 1 0 0 2
I'm trying to build a for loop that iterates over each column (except the column "views") and each row. If the value of a cell is not 0, I want to replace it with the corresponding value of the column "views" from the same row.
The following output is required (should be easier to understand):
option1 option2 option3 views
0 6 0 6 6
1 0 0 10 10
2 5 5 0 5
3 2 0 0 2
I tried something like:
df_range = len(df)
for column in df:
for i in range(df_range):
if column != 0:
column = df.views[i]
But I know I'm missing something, it does not work.
Also please note that in my real dataframe, I have dozens of columns, so I need something that iterates over each column automatically. Thanks!!
I saw this thread Update a dataframe in pandas while iterating row by row but it doesn't exactly apply to my problem, because I'm not only going row by row, I also need to go column by column.
Iterate Over DataFrame Columns One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
iteritems(): Dataframe class provides a member function iteritems() which gives an iterator that can be utilized to iterate over all the columns of a data frame.
You can also achieve the result you want this way:
for col in df:
if col == 'views':
continue
for i, row_value in df[col].iteritems():
df[col][i] = row_value * df['views'][i]
Notice the following about this solution:
1) This solution operates on each value in the dataframe individually and so is less efficient than broadcasting, because it's performing two loops (one outer, one inner).
2) This solution assumes that option1
...option N are binary because essentially this solution is multiplying each binary value in option1
...option N with the values in views
.
3) This solution will work for any number of option columns. The option columns may have any labels you desire.
4) This solution assumes there is a column labeled views
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With