What's the recommended implementation for hashing OLE Variants?

Tags:

OLE Variants, as used by older versions of Visual Basic and pervasively in COM Automation, can store lots of different types: basic types like integers and floats, more complicated types like strings and arrays, and all the way up to IDispatch implementations and pointers in the form of ByRef variants.

Variants are also weakly typed: they convert the value to another type without warning depending on which operator you apply and what the current types are of the values passed to the operator. For example, comparing two variants, one containing the integer 1 and another containing the string "1", for equality will return True.

So assuming that I'm working with variants at the underlying data level (e.g. VARIANT in C++ or TVarData in Delphi - i.e. the big union of different possible values), how should I hash variants consistently so that they obey the right rules?

Rules:

Variants that hash unequally should compare as unequal, both in sorting and direct equality
Variants that compare as equal for both sorting and direct equality should hash as equal

It's OK if I have to use different sorting and direct comparison rules in order to make the hashing fit.

The way I'm currently working is I'm normalizing the variants to strings (if they fit), and treating them as strings, otherwise I'm working with the variant data as if it was an opaque blob, and hashing and comparing its raw bytes. That has some limitations, of course: numbers 1..10 sort as [1, 10, 2, ... 9] etc. This is mildly annoying, but it is consistent and it is very little work. However, I do wonder if there is an accepted practice for this problem.

804

asked Mar 19 '10 18:03

Barry Kelly

1 Answers

There's a built in tension in your question between the use of a hash function and the stated requirements, which are to validated against the input of the hash. I'd suggest we keep in mind a few properties of hashes in general: information is lost during the hashing process, and hash collisions are to be expected. It is possible to construct a perfect hash without collisions, but it would be problematic (or impossible?) to construct a perfect hash function if the domain of the function is any possible OLE Variant. On the other hand, if we're not talking about a perfect hash, then your first rule is violated.

I don't know the larger context of what you're trying to accomplish, but I must push back on one of your assumptions: is a hash function really what you want? Your requirements could be met in a fairly straightforward way if you develop a system that encodes, not hashes, all of the possible OLE Variant attributes so that they can be recalled later and compared against other Variant images.

Your baseline implementation of converting the Variant to a string representation is moving in this direction. As you are no doubt aware, a Variant can contain pointers, double pointers, and arrays, so you'll have to develop consistent string representation of these data types. I question whether this approach could really be classified as a hash. Aren't you just persisting data attributes?

184

answered Sep 27 '22 18:09

Paul Keister

Related questions
                            
                                Why is ctor with default argument std::initializer_list unavailable (VS2019)?
                            
                                can you switch over a std::any.type()?
                            
                                The return code of an application is an int16_t?
                            
                                May the compiler optimize out stores though a pointer-to-volatile? [duplicate]
                            
                                can we use alloca() or variable length array extentions in c++20 coroutines?
                            
                                C++ Modules Forward Declaring Entity from Another Module
                            
                                How unused default member initializer can change program behavior in C++?
                            
                                Different behavior in C regex vs C++ regex using extended POSIX grammar
                            
                                Is a class template's name in scope for a qualified out-of-line destructor's definition?
                            
                                Is reduction of `constexpr` object lifetime legal in C++?
                            
                                strict total order of std::less<Pointer>
                            
                                Is there a reason why make_reverse_iterator(make_reverse_iterator(it)) results in a different type?
                            
                                How can I specialize std::common_type<A,B> so that it's naturally commutative?
                            
                                Whats the difference between reference to an array and array as a parameters in functions?
                            
                                ORM (object relational manager) solution with multiple programming language support
                            
                                Unexposing a snapshot in XP (Volume Shadow Copy)
                            
                                Learning Graphical Layout Algorithms
                            
                                Why do I get a "member function not present" error when evaluating expressions on the VC++ debugger?
                            
                                Is time.h clock() broken on my hardware?
                            
                                Unit testing for compiler errors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the recommended implementation for hashing OLE Variants?

Tags:

c++

com

winapi

delphi

variant

Barry Kelly

People also ask

1 Answers

Paul Keister

Recent Activity

Donate For Us