Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where might I begin on this optimization problem?

I have a simple program which reads a bunch of things from the filesystem, filters the results , and prints them. This simple program implements a domain specific language to make selection easier. This DSL "compiles" down into an execution plan that looks like this (Input was C:\Windows\System32\* OR -md5"ABCDEFG" OR -tf):

Index  Success  Failure  Description
    0        S        1  File Matches C:\Windows\System32\*
    1        S        2  File MD5 Matches ABCDEFG
    2        S        F  File is file. (Not directory)

The filter is applied to the given file, and if it succeeds, the index pointer jumps to the index indicated in the success field, and if it fails, the index pointer jumps to the number indicated in the failure field. "S" means that the file passes the filter, F means that the file should be rejected.

Of course, a filter based upon a simple file attribute (!FILE_ATTRIBUTE_DIRECTORY) check is much faster than a check based upon the MD5 of the file, which requires opening and performing the actual hash of the file. Each filter "opcode" has a time class associated with it; MD5 gets a high timing number, ISFILE gets a low timing number.

I would like to reorder this execution plan so that opcodes that take a long time are executed as rarely as possible. For the above plan, that would mean it would have to be:

Index  Success  Failure  Description
    0        S        1  File is file. (Not directory)
    1        S        2  File Matches C:\Windows\System32\*
    2        S        F  File MD5 Matches ABCDEFG

According to the "Dragon Book", picking the best order of execution for three address code is an NP-Complete problem (At least according to page 511 of the second edition of that text), but in that case they are talking about register allocation and other issues of the machine. In my case, the actual "intermediate code" is much simpler. I'm wondering of a scheme exists that would allow me to reorder the source IL into the optimal execution plan.

Here is another example:
{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }

Parsed to:

Index  Success  Failure  Description
    0        1        2  File Matches C:\Windows\Inf\*
    1        S        2  File is a Portable Executable
    2        3        F  File is file. (Not directory)
    3        F        S  File Matches C:\Windows\System32\Drivers\*

which is optimally:

Index  Success  Failure  Description
    0        1        2  File is file. (Not directory)
    1        2        S  File Matches C:\Windows\System32\Drivers\*
    2        3        F  File Matches C:\Windows\Inf\*
    3        S        F  File is a Portable Executable
like image 346
Billy ONeal Avatar asked Jan 01 '26 22:01

Billy ONeal


1 Answers

It sounds like it might be easier to pick an optimal order before compiling down to your opcodes. If you have a parse tree, and it is as "flat" as possible, then you can assign a score to each node and then sort each node's children by the lowest total score first.

For example:

{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }
         1             2          3                 4

      OR
     /  \
  AND    AND
 /  \   /   \
1    2 3     4

You could sort the AND nodes (1, 2) and (3, 4) by the lowest score and then assign that score to each node. Then sort the children of the OR node by the lowest score of their children.

Since AND and OR are commutative, this sorting operation won't change the meaning of your overall expression.

like image 180
Greg Hewgill Avatar answered Jan 06 '26 11:01

Greg Hewgill



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!