Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas data frame warning, suggest to use .loc instead?

Hi I would like to manipulate the data by removing missing information and make all letters lower case. But for the lowercase conversion, I get this warning:

E:\Program Files Extra\Python27\lib\site-packages\pandas\core\frame.py:1808: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)
C:\Users\KubiK\Desktop\FamSeach_NameHandling.py:18: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy frame3["name"] = frame3["name"].str.lower()

C:\Users\KubiK\Desktop\FamSeach_NameHandling.py:19: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy frame3["ethnicity"] = frame3["ethnicity"].str.lower()

import pandas as pd
from pandas import DataFrame

# Get csv file into data frame
data = pd.read_csv("C:\Users\KubiK\Desktop\OddNames_sampleData.csv")
frame = DataFrame(data)
frame.columns = ["name", "ethnicity"]
name = frame.name
ethnicity = frame.ethnicity

# Remove missing ethnicity data cases
index_missEthnic = frame.ethnicity.isnull()
index_missName = frame.name.isnull()
frame2 = frame[index_missEthnic != True]
frame3 = frame2[index_missName != True]

# Make all letters into lowercase
frame3["name"] = frame3["name"].str.lower()
frame3["ethnicity"] = frame3["ethnicity"].str.lower()

# Test outputs
print frame3

This warning doesn't seem to be fatal (at least for my small sample data), but how should I deal with this?

Sample data

Name    Ethnicity
Thos C. Martin                              Russian
Charlotte Wing                              English
Frederick A T Byrne                         Canadian
J George Christe                            French
Mary R O'brien                              English
Marie A Savoie-dit Dugas                    English
J-b'te Letourneau                           Scotish
Jane Mc-earthar                             French
Amabil?? Bonneau                            English
Emma Lef??c                                 French
C., Akeefe                                  African
D, James Matheson                           English
Marie An: Thomas                            English
Susan Rrumb;u                               English
                                            English
Kaio Chan   
like image 416
KubiK888 Avatar asked Dec 12 '25 16:12

KubiK888


1 Answers

Not sure why do you need so many booleans... Also note that .isnull() does not catch empty strings. And filtering empty string before applying .lower() doesn't seems neccessary either. But it there is a need... This works for me:

frame = pd.DataFrame({'name':['Abc Def', 'EFG GH', ''], 'ethnicity':['Ethnicity1','', 'Ethnicity2']})
print frame

    ethnicity     name
0  Ethnicity1  Abc Def
1               EFG GH
2  Ethnicity2         

name_null = frame.name.str.len() == 0
frame.loc[~name_null, 'name'] = frame.loc[~name_null, 'name'].str.lower()
print frame

    ethnicity     name
0  Ethnicity1  abc def
1               efg gh
2  Ethnicity2         
like image 198
Primer Avatar answered Dec 15 '25 12:12

Primer