EDITED AS PER COMMENTS
Background: Here is what the current dataframe looks like. The row labels are information texts in original excel file. But I hope this small reproduction of data will be enough for a solution? Actual file has about 100 columns and 200 rows.
Column headers and Row #0 values are repeated with pattern shown below -- except the Sales
or Validation
text changes at every occurrence of column with an existing title.
One more column before sales with text in each row. Mapping of Xs done for this test. Unfortunately, found no elegant way of displaying text as part of output below.
Sales Unnamed: 2 Unnamed: 3 Validation Unnamed: 5 Unnamed: 6
0 Commented No comment Commented No comment
1 x x
2 x x
3 x x
Expected Output: Replacing the X with 0s, 1s and 2s depending on which column they are in (Commented / No Comment)
Sales Unnamed: 2 Unnamed: 3 Validation Unnamed: 5 Unnamed: 6
0 Commented No comment Commented No comment
1 0 1
2 2 0
3 1 2
Possible Code: I assume the loop would look something like this:
while in row 9:
if column value = "commented":
replace all "x" with 1
elif row 9 when column valkue = "no comment":
replace all "x" with 2
else:
replace all "x" with 0
But being a python novice, I am not sure how to convert this to a working code. I'd appreciate all support and help.
Here is one way to do it:
import re
def replaceX(col):
cond = ~((col == "x") | (col == "X"))
# Check if the name of the column is undefined
if not re.match(r'Unnamed: \d+', col.name):
return col.where(cond, 0)
else:
# Check what is the value of the first row
if col.iloc[0] == "Commented":
return col.where(cond, 1)
elif col.iloc[0] == "No comment":
return col.where(cond, 2)
return col
Or if your first row don't contain "Commented" or "No comment" for titled columns you can have a solution without regex:
def replaceX(col):
cond = ~((col == "x") | (col == "X"))
# Check what is the value of the first row
if col.iloc[0] == "Commented":
return col.where(cond, 1)
elif col.iloc[0] == "No comment":
return col.where(cond, 2)
return col.where(cond, 0)
# Apply the function on every column (axis not specified so equal 0)
df.apply(lambda col: replaceX(col))
Output:
title Unnamed: 2 Unnamed: 3
0 Commented No comment
1
2 0 2
3 1
Documentation:
- Apply: apply a function on every columns/rows depending on the axis
- Where: check where a condition is met on a series, if it is not met, replace with value specified.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With