Class RIR::Document
In: lib/rir/document.rb
Parent: Object

A Document is a bag of words and is constructed from a string.


count_words   entropy   format_words   new   ngrams  


doc_content  [R] 
words  [R] 

Public Class methods

Public Instance methods

Returns a Hash containing the words and their associated counts in the current Document.

  count_words #=> { "guitar"=>1, "bass"=>3, "album"=>20, ... }

Computes the entropy of a given string s inside the document.

If the string parameter is composed of many words (i.e. tokens separated by whitespace(s)), it is considered as an ngram.

  entropy("guitar") #=> 0.00389919463243839

Returns an Array containing the n-grams (words) from the current Document.

  ngrams(2) #=> ["the free", "free encyclopedia", "encyclopedia var", "var skin", ...]

Protected Instance methods

Any non-word characters are removed from the words (see and the W special escape).

Protected function, only meant to by called at the initialization.