I am working with some US govt data which has a lengthy list of cities and zip codes. After some work, the data is in the following format.
dat1 = data.frame(keyword=c("Bremen", "Brent", "Centreville, AL", "Chelsea, AL", "Bailytown, Alabama", "Calera, Alabama",
"54023", "54024"), tag=c(rep("AlabamCity",2), rep("AlabamaCityST",2), rep("AlabamaCityState",2), rep("AlabamaZipCode",2)))
dat1
However, there are certain keywords which aren't properly working. So in the below example, there are two 'zip codes' which are labeled as 'AlabamaCity' and 'AlabamaCityState'. For some reason, the original data set from the government has several zipcodes which aren't properly grouped with the other zip codes.
dat2 = data.frame(keyword=c("Bremen", "Brent", "50143", "Chelsea, AL", "Bailytown, Alabama", "52348",
"54023", "54024"), tag=c(rep("AlabamCity",2), rep("AlabamaCityST",2), rep("AlabamaCityState",2), rep("AlabamaZipCode",2)))
dat2
I wanted to know how I could iterate through the entire list of keywords and delete all the rows with numeric values (they're acctually saved as character values) which don't have a 'AlabamaZipCode' tag. So the previous data should end up looking like.
dat3 = data.frame(keyword=c("Bremen", "Brent", "Chelsea, AL", "Bailytown, Alabama", "54023", "54024"),
tag=c(rep("AlabamCity",2), rep("AlabamaCityST",1), rep("AlabamaCityState",1), rep("AlabamaZipCode",2)))
dat3
The challange seems to be that there are certain numeric values which I want to keep and others which I want to delete. Can anyone help.
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions. For example, na. omit(YourDataframe) will drop all rows with an.
drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns.
I think two grepl expressions should do the trick:
> dat2[ !( grepl("City", dat2$tag) & grepl("^\\d", dat2$keyword) ) , ]
keyword tag
1 Bremen AlabamCity
2 Brent AlabamCity
4 Chelsea, AL AlabamaCityST
5 Bailytown, Alabama AlabamaCityState
7 54023 AlabamaZipCode
8 54024 AlabamaZipCode
You are eliminating the rows where there are digits in keyword
and "City" in tag
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With