How do AV engines search files for known signatures so efficiently?

Tags:

Data in the form of search strings continue to grow as new virus variants are released, which prompts my question - how do AV engines search files for known signatures so efficiently? If I download a new file, my AV scanner rapidly identifies the file as being a threat or not, based on its signatures, but how can it do this so quickly? I'm sure by this point there are hundreds of thousands of signatures.

943

asked May 04 '13 18:05

Charles Saag

1 Answers

UPDATE: As tripleee pointed out, the Aho-Corasick algorithm seems very relevant to virus scanners. Here is some stuff to read:

http://www.dais.unive.it/~calpar/AA07-08/aho-corasick.pdf

http://www.researchgate.net/publication/4276168_Generalized_Aho-Corasick_Algorithm_for_Signature_Based_Anti-Virus_Applications/file/d912f50bd440de76b0.pdf

http://jason.spashett.com/av/index.htm

Aho-Corasick-like algorithm for use in anti-malware code

Below is my old answer. Its still relevant for easily detecting malware like worms which simply make copies of themselves:

I'll just write some of my thoughts on how AVs might work. I don't know for sure. If someone thinks the information is incorrect, please notify me.

There are many ways in which AVs detect possible threats. One way is signature-based detection.

A signature is just a unique fingerprint of a file (which is just a sequence of bytes). In terms of computer science, it can be called a hash. A single hash could take about 4/8/16 bytes. Assuming a size of 4 bytes (for example, CRC32), about 67 million signatures could be stored in 256MB.

All these hashes can be stored in a signature database. This database could be implemented with a balanced tree structure, so that insertion, deletion and search operations can be done in O(logn) time, which is pretty fast even for large values of n (n is the number of entries). Or else if a lot of memory is available, a hashtable can be used, which gives O(1) insertion, deletion and search. This is can be faster as n grows bigger and a good hashing technique is used.

So what an antivirus does roughly is that it calculates the hash of the file or just its critical sections (where malicious injections are possible), and searches its signature database for it. As explained above, the search is very fast, which enables scanning huge amounts of files in a short amount of time. If it is found, the file is categorized as malicious.

Similarly, the database can be updated quickly since insertion and deletion is fast too.

You could read these pages to get some more insight.

Which is faster, Hash lookup or Binary search?

https://security.stackexchange.com/questions/379/what-are-rainbow-tables-and-how-are-they-used

119

answered Oct 24 '22 14:10

max

Related questions
                            
                                python search with image google images
                            
                                Calculating Manhattan Distance in Python in an 8-Puzzle game
                            
                                Search tool in Meteor JS
                            
                                Aggregate and filter from one index to another through a third
                            
                                Storing search result for paging and sorting
                            
                                Searching a SQLite database which contains cyrillic data
                            
                                5 numbers such that their sum equals 0
                            
                                Optimizing a simple search algorithm
                            
                                Image/"most resembling pixel" search optimization?
                            
                                Need some help for understanding search algorithms (A*, IDA*, DFS, BFS, IDDFS, etc. )
                            
                                Why would eclipse not show search results inside file in results list?
                            
                                Android - How to disable Search button, how to implement onSearchRequested()?
                            
                                Code fragment repository search on github.com
                            
                                Search String By SubWords
                            
                                How to display Alpha Beta Pruning algorithm result?
                            
                                elastic search exact phrase matching
                            
                                Performance impact of using a string of length 100 characters as _Id column in Elastic Search
                            
                                Searching with ModelChoiceField in django
                            
                                How do I find/search/grep an SVN repository history?
                            
                                Simple file text search tool? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do AV engines search files for known signatures so efficiently?

Tags:

search

antivirus

Charles Saag

People also ask

1 Answers

max

Recent Activity

Donate For Us