I am working with PLINK to analyse genome-wide data.
Does anyone know how to remove duplicated SNPs?
In PLINK 1.9, use --list-duplicate-vars suppress-first
, which will list duplicates, and remove one (the first one), leaving the other intact. I've know this to slip up though.
Instead of using --exclude
as Davy suggested, you can also use --extract
, keeping rather than getting rid of a list of SNPs. There's an easy method on any Unix based system (assuming your data are in PED/MAP format and cut up by chromossome):
for i in {1..22}; do
cat yourfile_chr${i}.map | grep "$i" | cut -f -4 | uniq | cut -f -2 | keepers_chr${i}.txt;
done
This will create a keepers_chr.txt
file with SNP IDs for SNPs at unique locations. Then run PLINK feeding it your original file(s) and use --extract keepers_chr
, with --make-bed --out unique_file
There is no command to do it automatically that I am aware of, but the way I have done it in the past is to get a list of SNPs that are duplicated, change the duplicates to rs1001.dup for example, then run --update-allele --update-name
and then create a list of the duplicates, so all the entries will have .dup
at the end of their names, and then run --extract duplicateSNPs.txt --make-bed --out yourfilename.dups.removed
Getting a list of SNPs that are duplicated shouldn't be too hard if you are familiar with R. Sorry to give you a "well just learn X!!!" answer
A couple of others ideas that might be of help/interest:
You can also remove vcf duplicates using bcftools with the command bcftools norm -D, --remove-duplicates
bcftools documentation can be found at https://samtools.github.io/bcftools/bcftools.html
In the spirit of also just using Unix to remove duplicates, I've previously used the following (input is a compressed vcf file) gunzip -c input.vcf.gz | grep "^[^##]" | cut -f3 | sort | uniq -d > plink.dupvar
plink.dupvar is the filename the PLINK program looks for when performing the duplication removal step.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With