R remove duplicate elements in character vector, not duplicate rows

Question

I am hitting a brick wall with this problem.

I have a data frame (dates) with some document ids and dates stored in a character vector:

  Doc     Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003", "07/01/2000")
3 34567 c("09/06/2004", "09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")

I am trying to remove the duplicate elements in the Dates to get this result:

  Doc     Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003")
3 34567 c("09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")

I have tried:

R>unique(dates$dates)

but it removes duplicate rows by Dates:

  Doc     Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003")
3 34567 c("09/06/2004", "12/30/2006")

Any help on how to remove only the duplicate elements in Dates, and not remove duplicate Rows by Dates?

** Updated with data

# Match some text string (dates) from some text:

df1$dates <- as.character(strapply(df1[[2]], "((\D\d{1,2}(/|-)\d{1,2}(/|-)\d{2,4})|    ([^/]\d{1,2}(/|-)\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\s|-]{0,2}\d{1,4}(\D[\s|-]{0,}\d{2,4}){0,}))"))

# Drop first 2 columns from dataframe
df2<-df1[ -c(1,2)]

# List data
>df2
872                     7/23/2007
873 c(" 11/4/2007", " 11/4/2007")
874   c(" 4/2/2008", " 8/2/2007")
880                    11/14/2006

> class(df2)
[1] "data.frame"

> class(df2$dates)
[1] "character"

> dput(df2)
structure(list(dates = c("NULL", "NULL", " 7/23/2007", "c(\" 11/4/2007\", \" 11/4/2007\")", 
"c(\" 4/2/2008\", \" 8/2/2007\")", "NULL", "NULL", "NULL", "NULL", 
"NULL", " 11/14/2006")), .Names = "dates", class = "data.frame", row.names = 870:880)

So my issue is how to get rid of the duplicate dates in Row 873?

Ferdinand.kraft · Accepted Answer

Try this:

within(dates, Dates <- lapply(Dates, unique))

user2547308 · Answer

I solved the issue I was having of removing duplicate values from a character vector - wrap a lapply(strapply(), unique):

df1$date <- as.character(lapply((strapply(df1[[2]], "((\D\d{1,2}(/|-)\d{1,2}(/|-    )\d{2,4})|(\s\d{1,2}(/|-)\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\s|-]{0,2}\d{1,4}(\D[\s|-]{0,}\d{2,4}){0,}))")),unique))

Thanks for all your help.

R remove duplicate elements in character vector, not duplicate rows

Tags:

r

duplicates

user2547308

2 Answers

Ferdinand.kraft

user2547308

Recent Activity

Donate For Us

R remove duplicate elements in character vector, not duplicate rows

Tags:

r

duplicates

user2547308

2 Answers

Ferdinand.kraft

user2547308

Related questions

Recent Activity

Donate For Us