Has anyone in the NLP field heard of the term Zone Hashing? From what I hear, zone hashing is the process of iterating through a document and extracting sentences. An accumulation of sentences is then hashed, and the process continues for the next n sentences...
I haven't found any references to this on Google, so I'm wondering if it goes by a different name. It should be related to measuring text similarity/nearness.
Perhaps it refers to locality sensitive hashing?
As far as I know, "zone hashing" is not a well established concept in the NLP as a discipline. It is just a simple concept used in some algorithms (related to NLP). The only one I know, which uses it is a Sphinx search server, and here, "zone hashing" is simply "hashing of objects called zones", where "zone" is described as follows:
Zones can be formally defined as follows. Everything between an opening and a matching closing tag is called a span, and the aggregate of all spans corresponding sharing the same tag name is called a zone. For instance, everything between the occurrences of < H1 > and < /H1 > in the document field belongs to H1 zone.
Zone indexing, enabled by index_zones directive, is an optional extension of the HTML stripper. So it will also require that the stripper is enabled (with html_strip = 1). The value of the index_zones should be a comma-separated list of those tag names and wildcards (ending with a star) that should be indexed as zones.
Zones can nest and overlap arbitrarily. The only requirement is that every opening tag has a matching tag. You can also have an arbitrary number of both zones (as in unique zone names, such as H1) and spans (all the occurrences of those H1 tags) in a document. Once indexed, zones can then be used for matching with the ZONE operator, see Section 5.3, “Extended query syntax”.
And hashing of these structures is used in the traditional sense to speed up search and lookup. I am not aware of any "deeper" meaning.
Perhaps it refers to locality sensitive hashing?
Locality sensitive hashing is a probabilistic method for multi dimensional data, I do not see any deeper connections to the zone hashing then fact that both use hash functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With