Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fwrite chokes on "<?xml version"

When the string <?xml version is written to a file via fwrite, the subsequent writing operations become slower.

This code :

#include <cstdio>
#include <ctime>
#include <iostream>

int main()
{
    const long index(15000000); 

    clock_t start_time(clock());
    FILE*  file_stream1 = fopen("test1.txt","wb");
    fwrite("<?xml version",1,13,file_stream1);
    for(auto i = 1;i < index ;++i)
        fwrite("only 6",1,6,file_stream1);
    fclose(file_stream1);

    std::cout << "\nOperation 1 took : " 
        << static_cast<double>(clock() - start_time)/CLOCKS_PER_SEC 
        << " seconds.";


    start_time = clock();
    FILE*  file_stream2 = fopen("test2.txt","wb");
    fwrite("<?xml versioX",1,13,file_stream2);
    for(auto i = 1;i < index ;++i)
        fwrite("only 6",1,6,file_stream2);
    fclose(file_stream2);

    std::cout << "\nOperation 2 took : " 
        << static_cast<double>(clock() - start_time)/CLOCKS_PER_SEC 
        << " seconds.";


    start_time = clock();
    FILE*  file_stream3 = fopen("test3.txt","w");
    const char test_str3[] = "<?xml versioX";
    for(auto i = 1;i < index ;++i)
        fwrite(test_str3,1,13,file_stream3);
    fclose(file_stream3);

    std::cout << "\nOperation 3 took : " 
        << static_cast<double>(clock() - start_time)/CLOCKS_PER_SEC 
        << " seconds.\n";

    return 0;
}

Gives me this result :

Operation 1 took : 3.185 seconds.
Operation 2 took : 2.025 seconds.
Operation 3 took : 2.992 seconds.

That is when we replace the string "<?xml version" (operation 1) with "<?xml versioX" (operation 2) the result is significantly faster. The third operation is as fast as the first though it's writing twice more characters.

Can anyone reproduce this?

Windows 7, 32bit, MSVC 2010

EDIT 1

After R.. suggestion, disabling Microsoft Security Essentials restores normal behavior.

like image 735
anno Avatar asked May 07 '11 23:05

anno


1 Answers

On Windows, most (all?) anti-virus software works by hooking into the file read and/or write operations to run the data being read or written again virus patterns and classify it as safe or virus. I suspect your anti-virus software, once it sees an XML header, loads up the XML-malware virus patterns and from that point on starts constantly checking to see if the XML you're writing to disk is part of a known virus.

Of course this behavior is utterly nonsensical and is part of what gives AV programs such a bad reputation with competent users, who see their performance plummet as soon as they turn on AV. The same goal could be accomplished in other ways that don't ruin performance. Here are some ideas they should be using:

  • Only scan files once at transitions between writing and reading, not after every write. Even if you did write a virus to disk, it doesn't become a threat until it subsequently gets read by some process.
  • Once a file is scanned, remember that it's safe and don't scan it again until it's modified.
  • Only scan files that are executable programs or that are detected as being used as script/program-like data by another program.

Unfortunately I don't know of any workaround until AV software makers wise up, other than turning your AV off... which is generally a bad idea on Windows.

like image 101
R.. GitHub STOP HELPING ICE Avatar answered Sep 29 '22 13:09

R.. GitHub STOP HELPING ICE