Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a "term-vector algorithm"?

Tags:

algorithm

Google states that a "term-vector algorithm" can be used to determine popular keywords. I have studied http://en.wikipedia.org/wiki/Vector_space_model, but cant understand the term "term-vector algorithm".

Please explain it in a brief summary, very simple language, as if the reader is a child.

I believe "vector" refers to the mathematics definition, a quantity having direction as well as magnitude. How is it that keywords have a quantity moving in a direction?

http://en.wikipedia.org/wiki/Vector_space_model states "Each dimension corresponds to a separate term." I thought dimension relates to cardinality, is that correct?

enter image description here

From the book Hadoop In Practice, by Alex Holmes, page 12.

like image 914
davidjhp Avatar asked Jul 24 '13 23:07

davidjhp


1 Answers

It means that each word forms a separate dimension:

Example: (shamelessly taken from here)

For a model containing only three words you would get:

dict = { dog, cat, lion }

Document 1
“cat cat” → (0,2,0) 

Document 2
“cat cat cat” → (0,3,0)

Document 3
“lion cat” → (0,1,1)

Document 4 
“cat lion” → (0,1,1)
like image 169
matcheek Avatar answered Nov 12 '22 09:11

matcheek