I would like to identify column names in a dataframe that partially match a string, and replace them with the original names plus some new elements added to them. The new elements are integers defined by a list. Here is a similar question, but I'm afraid the suggested solution will not be flexible enough in my particular case. And here is another post with a few excellent answers that come close to the problem I'm facing.
I know I can combine two lists of strings, map them pairwise into a dictionary , and rename the columns using the dictionary as input in the function df.rename
. But this seems a bit too complicated, and not very flexible taking into consideration that the number of existing columns will vary. As will the numbers of columns to be renamed.
The following snippet will produce an input example:
# Libraries
import numpy as np
import pandas as pd
import itertools
# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','obs_1','obs_2','obs_3','obs_4'])
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print(df)
I want to identify the column names that start with obs_
, and add elements (integers) from the list newElements = [5, 10, 15, 20]
following an =
sign. The column named Price
remains the same. Other columns appearing after the obs_
columns should also stay the same.
The following snippet will demonstrate the desired output:
# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])
df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print(df2)
# Define the partial string I'm lookin for
stringMatch = 'Obs_'
# Put existing column names in a list
oldnames = list(df)
# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]
# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()
# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']
# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:
#print(repr(oldElement) + repr(str_newElements[i]))
newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
i = i + 1
# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)
print('My attempt: ', df)
Having already made a complete list of the new column names
I could just as well have used df.columns = newnames
,
but hopefully one of you have got a suggestion using
df.rename
in a more pythonic way.
Thank you for any suggestions!
Here's the whole code for an easy copy-paste:
# Libraries
import numpy as np
import pandas as pd
import itertools
# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','obs_1','obs_2','obs_3','obs_4'])
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print('Input: ', df)
# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])
df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print('Desired output: ', df2)
# My attempts
# Define the partial string I'm lookin for
stringMatch = 'Obs_'
# Put existing column names in a list
oldnames = list(df)
# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]
# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()
# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']
# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:
#print(repr(oldElement) + repr(str_newElements[i]))
newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
i = i + 1
# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)
print('My attempt: ', df)
So many good answers after only one day is just amazing! This made it really hard to decide which answer to accept. I don't know if the following will add much value to the post as a whole, but I went ahead and wrapped all suggestions into functions and tested them with %timeit.
Here are the results:
The suggestion fram HH1 was the first to be posted, and is also one of the fastest in terms of execution time. I'll include the code later if anyone is interested.
The suggestion from suvy rendered these results when I tried it:
The snippet worked fine until the last line. After running the line df = df.rename(columns=dict(zip(names,renames)))
the data frame looked like this:
does this work ?
df.columns = [col + ' = ' + str(newElements.pop(0)) if col.startswith(stringMatch) else col for col in df.columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With