Picking certain values in Series as headings

Question

I have a DataFrame that has a column that looks like:

Japan
valA
valB
Ghana
valC
valD
...

I want to extract the country names from this list and turn them into another column like so:

Japan    valA
Japan    valB
Ghana    valC
Ghana    valD

I am sure there's an answer for this already on SO, but I haven't been able to find the correct keywords to bring it up.

Right now, I am doing the following, but I then have to drop rows that initially contained the country names:

def get_country(row):
    if #decide if it's a country name:
        return row[0]
df['country'] = df.apply(get_country, axis=1).fillna(method='ffill')

This seems like a fairly common use case when cleaning data, is there a standard/better way of doing this?

cs95 · Accepted Answer

I can get you started using map and ffill.

def is_country(x): 
    # TODO - fill in the logic for this stub.
    return x in {'Japan', 'Ghana'}

df

       A
0  Japan
1   valA
2   valB
3  Ghana
4   valC
5   valD


df.assign(B=df['A'].where(df['A'].map(is_country)).ffill()).query('A != B')

      A      B
1  valA  Japan
2  valB  Japan
4  valC  Ghana
5  valD  Ghana

You can use a package like pycountry (or something similar) to validate country names.

import pycountry
countries = {x.name for x in pycountry.countries}  # Initialise a set.

def is_country(x): 
    return x in countries

Although, with this definition, you can simplify your code to,

df.assign(B=df['A'].where(df['A'].isin(countries)).ffill()).query('A != B')

And get rid of the is_country function entirely.

Vaishali · Answer

Using extract

new_df = df['col'].str.extract('(val.*)?(.*)').replace('', np.nan).rename(columns = {1:'Country', 0:'Value'})
new_df['Country'] = new_df['Country'].ffill()
new_df.dropna(inplace = True)


    Value   Country
1   valA    Japan
2   valB    Japan
4   valC    Ghana
5   valD    Ghana

Picking certain values in Series as headings

Tags:

python

pandas

Laurent S

2 Answers

cs95

Vaishali

Recent Activity

Donate For Us

Picking certain values in Series as headings

Tags:

python

pandas

Laurent S

2 Answers

cs95

Vaishali

Related questions

Recent Activity

Donate For Us