I am reading in a data frame from an online csv file, but the person who create the file has accidentally entered some numbers into column which should just be city names. Sample for cities.data
table.
City Population Foo Bar
Seattle 10 foo1 bar1
98125 20 foo2 bar2
Kent 98042 30 foo3 bar3
98042 Kent 30 foo4 bar4
Desired output after removing rows with only numbers in the city column:
City Population Foo Bar
Seattle 10 foo1 bar1
Kent 98042 30 foo3 bar2
98042 Kent 30 foo4 bar4
I want to remove the rows with ONLY numbers in the city column. Kent 98042 and 98042 Kent are both okay since it contains the city name, but since 98125 is not a city I remove that row.
I can't use is.numeric
because the number is being read as a string in the csv file. I tried using regex,
cities.data <- cities.data[which(grepl("[0-9]+", cities.data) == FALSE)]
But this deletes rows with any numbers rather than just the one containing only numbers, e.g.
City Population Foo Bar
Seattle 10 foo1 bar1
"Kent 98042"
was deleted even though I wanted to keep that row.
Suggestions? Please and thanks!
To remove dot and number at the end of the string, we can use gsub function. It will search for the pattern of dot and number at the end of the string in the vector then removal of the pattern can be done by using double quotes without space. After that the vector will be passed as shown in the below examples.
The ncol() function in R programming That is, ncol() function returns the total number of columns present in the object.
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
df = read.table(text = "
City Population Foo Bar
Seattle 10 foo1 bar1
98125 20 foo2 bar2
Kent98042 30 foo3 bar2
", header=T, stringsAsFactors=F)
library(dplyr)
df %>% filter(is.na(as.numeric(City)))
# City Population Foo Bar
# 1 Seattle 10 foo1 bar1
# 2 Kent98042 30 foo3 bar2
The idea is that when we apply as.numeric
to a character variable it will not return a NA
value only if it is a number.
If you want to use base R you can use this: df[is.na(as.numeric(df$City)),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With