Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace column names in a pandas data frame that partially match a string

Background

I would like to identify column names in a dataframe that partially match a string, and replace them with the original names plus some new elements added to them. The new elements are integers defined by a list. Here is a similar question, but I'm afraid the suggested solution will not be flexible enough in my particular case. And here is another post with a few excellent answers that come close to the problem I'm facing.

Some research

I know I can combine two lists of strings, map them pairwise into a dictionary , and rename the columns using the dictionary as input in the function df.rename. But this seems a bit too complicated, and not very flexible taking into consideration that the number of existing columns will vary. As will the numbers of columns to be renamed.

The following snippet will produce an input example:

# Libraries
import numpy as np
import pandas as pd
import itertools

# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
              columns = ['Price','obs_1','obs_2','obs_3','obs_4'])

datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                     periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print(df)

Input

enter image description here

I want to identify the column names that start with obs_, and add elements (integers) from the list newElements = [5, 10, 15, 20] following an = sign. The column named Price remains the same. Other columns appearing after the obs_ columns should also stay the same.

The following snippet will demonstrate the desired output:

# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
              columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])

df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print(df2)

Output

enter image description here

My attempt

# Define the partial string I'm lookin for
stringMatch = 'Obs_'

# Put existing column names in a list
oldnames = list(df)

# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]

# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()

# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']

# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:   
    #print(repr(oldElement) + repr(str_newElements[i]))
    newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
    i = i + 1

# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)

print('My attempt: ', df)

enter image description here

Having already made a complete list of the new column names I could just as well have used df.columns = newnames, but hopefully one of you have got a suggestion using df.rename in a more pythonic way.

Thank you for any suggestions!

Here's the whole code for an easy copy-paste:

# Libraries
import numpy as np
import pandas as pd
import itertools

# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
                  columns = ['Price','obs_1','obs_2','obs_3','obs_4'])

datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print('Input: ', df)

# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
                  columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])

df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print('Desired output: ', df2)

# My attempts
# Define the partial string I'm lookin for
stringMatch = 'Obs_'

# Put existing column names in a list
oldnames = list(df)

# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]

# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()


# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']

# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:

    #print(repr(oldElement) + repr(str_newElements[i]))
    newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
    i = i + 1

# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)

print('My attempt: ', df)

EDIT: Aftermath

So many good answers after only one day is just amazing! This made it really hard to decide which answer to accept. I don't know if the following will add much value to the post as a whole, but I went ahead and wrapped all suggestions into functions and tested them with %timeit.

Here are the results: enter image description here

The suggestion fram HH1 was the first to be posted, and is also one of the fastest in terms of execution time. I'll include the code later if anyone is interested.

EDIT 2

The suggestion from suvy rendered these results when I tried it:enter image description here

The snippet worked fine until the last line. After running the line df = df.rename(columns=dict(zip(names,renames))) the data frame looked like this:

enter image description here

like image 424
vestland Avatar asked Jun 15 '17 12:06

vestland


1 Answers

does this work ?

df.columns = [col + ' = ' + str(newElements.pop(0)) if col.startswith(stringMatch) else col for col in df.columns]
like image 149
HH1 Avatar answered Nov 14 '22 22:11

HH1