I am using the following code to normalize a pandas DataFrame:
df_norm = (df - df.mean()) / (df.max() - df.min())
This works fine when all columns are numeric. However, now I have some string columns in df
and the above normalization got errors. Is there a way to perform such normalization only on numeric columns of a data frame (keeping string column unchanged)?
We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.
If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .
To remove characters from columns in Pandas DataFrame, use the replace(~) method. Here, [ab] is regex and matches any character that is a or b .
To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].
You can use select_dtypes
to calculate value for the desired columns:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': [4, 5, 6]})
df
a b c
0 1 a 4
1 2 b 5
2 3 c 6
df_num = df.select_dtypes(include='number')
df_num
a c
0 1 4
1 2 5
2 3 6
And then you can assign them back to the original df
:
df_norm = (df_num - df_num.mean()) / (df_num.max() - df_num.min())
df[df_norm.columns] = df_norm
df
a b c
0 -0.5 a -0.5
1 0.0 b 0.0
2 0.5 c 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With