Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert string of range (bins), into numerical values that can then be used with Seaborn visualisations

So, I'm working with Python 3.7 in Jupyter Notebooks. I'm currently exploring some survey data in the form of a Pandas imported from a .CSV file. I would like to explore further with some Seaborn visualisations, however, the numerical data has been gathered in the form of age bins, using string values.

Is there a way I could go about converting these columns (Age and Approximate Household Income) into numerical values, which could then be used with Seaborn? I've attempted searches but my wording seems to only be returning methods on creating age bins for columns with numerical values. I'm really looking for how I'd convert string values into numerical age bin values.

Also, does anybody have some tips on how I could improve my search method. What would have been the ideal wording for searching up a solution for something like this?

Here is an sample from the dataframe, using df.head(5).to_dict(), with values changed for anonymity purposes.

 'Age': {0: '45-54', 1: '35-44', 2: '45-54', 3: '45-54', 4: '55-64'},
 'Ethnicity': {0: 'White', 1: 'White', 2: 'White', 3: 'White', 4: 'White'},
 'Approximate Household Income': {0: '$175,000 - $199,999',
  1: '$75,000 - $99,999',
  2: '$25,000 - $49,999',
  3: '$50,000 - $74,999',
  4: nan},
 'Highest Level of Education Completed': {0: 'Four Year College Degree',
  1: 'Four Year College Degree',
  2: 'Jr College/Associates Degree',
  3: 'Jr College/Associates Degree',
  4: 'Four Year College Degree'},
 '2020 Candidate Choice': {0: 'Joe Biden',
  1: 'Joe Biden',
  2: 'Donald Trump',
  3: 'Joe Biden',
  4: 'Donald Trump'},
 '2016 Candidate Choice': {0: 'Hillary Clinton',
  1: 'Third Party',
  2: 'Donald Trump',
  3: 'Hillary Clinton',
  4: 'Third Party'},
 'Party Registration 2020': {0: 'Independent',
  1: 'No Party',
  2: 'No Party',
  3: 'Independent',
  4: 'Independent'},
 'Registered State for Voting': {0: 'Colorado',
  1: 'Virginia',
  2: 'California',
  3: 'North Carolina',
  4: 'Oregon'}
like image 349
DanMack Avatar asked Oct 16 '25 02:10

DanMack


1 Answers

You can use some of pandas Series.str methods.

Smaller example dataset:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "Age": {0: "45-54", 1: "35-44", 2: "45-54", 3: "45-54", 4: "55-64"},
        "Ethnicity": {0: "White", 1: "White", 2: "White", 3: "White", 4: "White"},
        "Approximate Household Income": {
            0: "$175,000 - $199,999",
            1: "$75,000 - $99,999",
            2: "$25,000 - $49,999",
            3: "$50,000 - $74,999",
            4: np.nan,
        },
    }
)
#      Age Ethnicity Approximate Household Income
# 0  45-54     White          $175,000 - $199,999
# 1  35-44     White            $75,000 - $99,999
# 2  45-54     White            $25,000 - $49,999
# 3  45-54     White            $50,000 - $74,999
# 4  55-64     White                          NaN

We can iterate through a list of columns and chain apply these methods to parse the ranges all within the pandas.DataFrame:

Methods we will use in order:

  • Series.str.replace - replace commas with nothing
  • Series.str.extract - extract the numbers from the Series, regex explained here
  • Series.astype - convert the extracted numbers to floats
  • DataFrame.rename - rename the new columns
  • DataFrame.join - add the extracted numbers back on to the original DataFrame
for col in ["Age", "Approximate Household Income"]:
    df = df.join(
        df[col]
        .str.replace(",", "", regex=False)
        .str.extract(pat=r"^[$]*(\d+)[-\s$]*(\d+)$")
        .astype("float")
        .rename({0: f"{col}_lower", 1: f"{col}_upper"}, axis="columns")
    )
#      Age Ethnicity Approximate Household Income  Age_lower  Age_upper  \
# 0  45-54     White          $175,000 - $199,999       45.0       54.0   
# 1  35-44     White            $75,000 - $99,999       35.0       44.0   
# 2  45-54     White            $25,000 - $49,999       45.0       54.0   
# 3  45-54     White            $50,000 - $74,999       45.0       54.0   
# 4  55-64     White                          NaN       55.0       64.0   
# 
#    Approximate Household Income_lower  Approximate Household Income_upper  
# 0                            175000.0                            199999.0  
# 1                             75000.0                             99999.0  
# 2                             25000.0                             49999.0  
# 3                             50000.0                             74999.0  
# 4                                 NaN                                 NaN  
like image 120
Alex Avatar answered Oct 18 '25 16:10

Alex



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!