Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace values in multiple untitled columns to 0, 1, 2 depending on column

EDITED AS PER COMMENTS

Background: Here is what the current dataframe looks like. The row labels are information texts in original excel file. But I hope this small reproduction of data will be enough for a solution? Actual file has about 100 columns and 200 rows.

Column headers and Row #0 values are repeated with pattern shown below -- except the Sales or Validation text changes at every occurrence of column with an existing title.

One more column before sales with text in each row. Mapping of Xs done for this test. Unfortunately, found no elegant way of displaying text as part of output below.

 Sales Unnamed: 2  Unnamed: 3  Validation Unnamed: 5 Unnamed: 6
0       Commented  No comment             Commented  No comment                                   
1     x                                             x                        
2                            x          x                                                
3                x                                             x             

Expected Output: Replacing the X with 0s, 1s and 2s depending on which column they are in (Commented / No Comment)

 Sales Unnamed: 2  Unnamed: 3  Validation Unnamed: 5 Unnamed: 6
0       Commented  No comment             Commented  No comment                                   
1     0                                            1                        
2                            2          0                                                
3                1                                             2  

Possible Code: I assume the loop would look something like this:

while in row 9:
    if column value = "commented":

        replace all "x" with 1

    elif row 9 when column valkue = "no comment":

        replace all "x" with 2

    else:

        replace all "x" with 0

But being a python novice, I am not sure how to convert this to a working code. I'd appreciate all support and help.

like image 390
mvx Avatar asked Nov 07 '22 14:11

mvx


1 Answers

Here is one way to do it:

  1. Define a function to replace the x:
import re

def replaceX(col):
    cond = ~((col == "x") | (col == "X"))
    # Check if the name of the column is undefined
    if not re.match(r'Unnamed: \d+', col.name):
        return col.where(cond, 0)
    else:
        # Check what is the value of the first row
        if col.iloc[0] == "Commented":
            return col.where(cond, 1)
        elif col.iloc[0] == "No comment":
            return col.where(cond, 2)
    return col

Or if your first row don't contain "Commented" or "No comment" for titled columns you can have a solution without regex:

def replaceX(col):
    cond = ~((col == "x") | (col == "X"))
    # Check what is the value of the first row
    if col.iloc[0] == "Commented":
        return col.where(cond, 1)
    elif col.iloc[0] == "No comment":
        return col.where(cond, 2)
    return col.where(cond, 0)
  1. Apply this function on the DataFrame:
# Apply the function on every column (axis not specified so equal 0)
df.apply(lambda col: replaceX(col))

Output:

  title Unnamed: 2  Unnamed: 3
0        Commented  No comment
1                             
2     0                      2
3                1            

Documentation:

  • Apply: apply a function on every columns/rows depending on the axis
  • Where: check where a condition is met on a series, if it is not met, replace with value specified.
like image 193
SmileyProd Avatar answered Nov 27 '22 17:11

SmileyProd