Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much storage would be required to store a human genome?

I'm looking for the amount of storage in bytes (MB, GB, TB, etc.) required to store a single human genome. I read a few articles on Wikipedia about DNA, chromosomes, base pairs, genes, and have some rough guess, but before disclosing anything I'd like to see how others would approach this issue.

An alternative question would be how many atoms are there in human DNA, but that would be off topic for this site.

I understand that this will be an approximation, so I'm looking for the minimal value that would be able to store DNA of any human.

like image 575
Milan Babuškov Avatar asked Jan 21 '12 16:01

Milan Babuškov


People also ask

How much storage does human DNA have?

The information density of DNA is remarkable — just one gram can store 215 petabytes, or 215 million gigabytes, of data.

How many terabytes is the human genome?

The data density of DNA is orders of magnitude higher than conventional storage systems, with 1 gram of DNA able to represent close to 1 billion terabytes (1 zettabyte) of data. DNA is also remarkably robust; DNA fragments thousands of years old have been successfully sequenced.

What is GB in genome size?

sapiens is 3.3 GB (3.3E9 base pairs). The genome size is alwas given as the total amount of DNA contained within one copy of a single genome (1n). The diploid (2n) human cell hat a DNA content of 6.6 pg. 1GB has a mass of 1pg (you can calculate it from the average molar weight of a base-pair, what is 660 g/mol).

How many KB is the human genome?

The human genome is distributed among 24 chromosomes (22 autosomes and the 2 sex chromosomes), each containing between 5 × 104 and 26 × 104 kb of DNA (Figure 4.26).


1 Answers

If you trust such things, here is what Wikipedia claims (from http://en.wikipedia.org/wiki/Human_genome#Information_content):

The 2.9 billion base pairs of the haploid human genome correspond to a maximum of about 725 megabytes of data, since every base pair can be coded by 2 bits. Since individual genomes vary by less than 1% from each other, they can be losslessly compressed to roughly 4 megabytes.

like image 181
Oliver Charlesworth Avatar answered Sep 19 '22 23:09

Oliver Charlesworth