Does grep allows search duplicates?

Question

I have many (near 100) big csv files with sellID in first column. I know that some sellID are repeated 2 or more times in 2 or more files. Is possible with grep find all this duplicate sellID (create map sellID-file_name)? Or exists another open source application for this purpose? My OS - CentOS.

Jim Dennis · Accepted Answer

Here's a very simple, somewhat crude awk script to accomplish something pretty close to what you seem to be describing:

#!/usr/bin/awk -f

{ if ($1 in seenbefore) {
    printf( "%s	%s
", $1, seenbefore[$1]);
    printf( "%s	%s
", $1, FILENAME);
    }
  seenbefore[$1]=FILENAME;
  }

As you can hopefully surmise all we're doing is building an associative array of each value you find in the first column/field (set FS in the BEGIN special block to change the input field separator ... for a trivially naive form of CSV support). As we encounter any duplicate we print out the dupe, the file we previously saw it in and the current filename. In any event we then add/update the array with the current file's name.

With more code you could store and print the line numbers of each, append filename/line number tuples to a list and move all the output to an END block where you summarize it in some a more concise format, and so on.

For any of that I'd personally shift to Python where the data types are richer (actual lists and tuples rather than having to concatenate them into strings or built and array of arrays) and I'd have access to much more power (an actual CSV parser which can handle various flavors of quoted CSV and alternative delimiters, and where producing sorted results is trivially easy).

However, this should, hopefully, get you on the right track.

Savino Sguera · Answer

Related question: https://serverfault.com/questions/66301/removing-duplicate-lines-from-file-with-grep

You could cat all the files in a single one, and then look for dupes as suggested in the link above.

BTW, it is not clear if you want to keep only the dupes or remove them.

Does grep allows search duplicates?

Tags:

grep

search

duplicates

user710818

2 Answers

Jim Dennis

Savino Sguera

Recent Activity

Donate For Us

Does grep allows search duplicates?

Tags:

grep

search

duplicates

user710818

2 Answers

Jim Dennis

Savino Sguera

Related questions

Recent Activity

Donate For Us