Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting specific rows from a data frame

Tags:

dataframe

r

I am working with some US govt data which has a lengthy list of cities and zip codes. After some work, the data is in the following format.

dat1 = data.frame(keyword=c("Bremen", "Brent", "Centreville, AL", "Chelsea, AL", "Bailytown, Alabama", "Calera, Alabama",
              "54023", "54024"), tag=c(rep("AlabamCity",2), rep("AlabamaCityST",2), rep("AlabamaCityState",2), rep("AlabamaZipCode",2)))
dat1

However, there are certain keywords which aren't properly working. So in the below example, there are two 'zip codes' which are labeled as 'AlabamaCity' and 'AlabamaCityState'. For some reason, the original data set from the government has several zipcodes which aren't properly grouped with the other zip codes.

dat2 = data.frame(keyword=c("Bremen", "Brent", "50143", "Chelsea, AL", "Bailytown, Alabama", "52348",
              "54023", "54024"), tag=c(rep("AlabamCity",2), rep("AlabamaCityST",2), rep("AlabamaCityState",2), rep("AlabamaZipCode",2)))
dat2

I wanted to know how I could iterate through the entire list of keywords and delete all the rows with numeric values (they're acctually saved as character values) which don't have a 'AlabamaZipCode' tag. So the previous data should end up looking like.

dat3 = data.frame(keyword=c("Bremen", "Brent", "Chelsea, AL", "Bailytown, Alabama", "54023", "54024"), 
          tag=c(rep("AlabamCity",2), rep("AlabamaCityST",1), rep("AlabamaCityState",1), rep("AlabamaZipCode",2)))
dat3

The challange seems to be that there are certain numeric values which I want to keep and others which I want to delete. Can anyone help.

like image 504
ATMathew Avatar asked Jul 06 '11 19:07

ATMathew


People also ask

How do I delete multiple rows in a data frame?

To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.

How do I remove rows from a specific value in R?

To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions. For example, na. omit(YourDataframe) will drop all rows with an.

How do I remove an item from a data frame?

drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns.


1 Answers

I think two grepl expressions should do the trick:

> dat2[ !( grepl("City", dat2$tag) &  grepl("^\\d", dat2$keyword) ) , ]
             keyword              tag
1             Bremen       AlabamCity
2              Brent       AlabamCity
4        Chelsea, AL    AlabamaCityST
5 Bailytown, Alabama AlabamaCityState
7              54023   AlabamaZipCode
8              54024   AlabamaZipCode

You are eliminating the rows where there are digits in keyword and "City" in tag

like image 155
IRTFM Avatar answered Sep 19 '22 15:09

IRTFM