Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter to show only duplicate entries in Excel 2010 [closed]

I have two files: Old and New. I need to remove any entry that Old contains from New. So I've exported both files to a tab delimited text file, and then merged the two text files so that now I many duplicate entries. My hopes was that I could come up with some formula show only the duplicate values, that way I could just delete them from there. Now how where can I go from the point I am at now?

Example old:

[41] [john] [smith]
[732] [bill] [murry]
[234] [adam] [sandler]
[9029] [tim] [allen]

Example new:

[10422] [bill] [gates]
[11456] [steve] [ballmer]
[732] [bill] [murry]
[24510] [jimmy] [jones]
[41] [john] [smith]
[234] [adam] [sandler]
[78241] [funny] [bunny]
[9029] [tim] [allen]

Example merged:

[10422] [bill] [gates]
[11456] [steve] [ballmer]
[732] [bill] [murry]
[732] [bill] [murry]
[24510] [jimmy] [jones]
[41] [john] [smith]
[41] [john] [smith]
[234] [adam] [sandler]
[234] [adam] [sandler]
[78241] [funny] [bunny]
[9029] [tim] [allen]
[9029] [tim] [allen]

Example of what I want to show:

[732] [bill] [murry]
[732] [bill] [murry]
[41] [john] [smith]
[41] [john] [smith]
[234] [adam] [sandler]
[234] [adam] [sandler]
[9029] [tim] [allen]
[9029] [tim] [allen]

The name of the column I'll be checking/comparing is "voter_id", which is the number field you see in the examples. I'm hoping by just showing these entries only, I'll be able to highlight them all, and delete. Then remove the filter/formula, and be left with something like:

[10422] [bill] [gates]
[11456] [steve] [ballmer]
[24510] [jimmy] [jones]
[78241] [funny] [bunny]

I apologize for making a separate question for this, but I've found a way to ask the same thing a different way, in a manner that would make much more sense for myself. It's essentially the same results, but in two completely different methods.

like image 546
Shea Avatar asked Jan 31 '12 03:01

Shea


2 Answers

Try conditional formatting On the home tab - Conditional formatting - highlight cells rules - duplicate values then filter by color and remove duplicates.

like image 78
Raystafarian Avatar answered Oct 09 '22 06:10

Raystafarian


There are many ways you can do this. If you're not familiar with formulas in Excel, the easiest solution I believe for you right now is to use a pivot table:

  1. Select your table (including the headings)
  2. Insert -> PivotTable -> OK
  3. If you can't see the field list, right click on your pivot table and select "Show Field List".
  4. Now in your pivot table, drag and drop all your columns to "Row Labels".
  5. Drag and drop your voter_id column to "Values".
  6. There should now be a column in your pivot table that says "Sum of voter_id". Right click on it and select Summarize Data by -> Count.
  7. Now you should able to see all your records and the number of occurrences. You can select all cells in your pivot table, copy and paste by values back to your original table. Then you can use a simple filter to only show records with more than one occurrences.

If you're comfortable with formulas, then you can simply add a new column and in that column, use COUNTIF to find out the number of times the voter_id exists in your table. After that, use a simple filter to filter out those with count = 1.

Another easy way you can do it is, instead of merging your two files together, simply use VLOOKUP to identify whether each of your record in one file exists in another.

like image 41
Pupper Avatar answered Oct 09 '22 06:10

Pupper