diff --git a/doc/classes/RIR.html b/doc/classes/RIR.html index 77f7788..e909f57 100644 --- a/doc/classes/RIR.html +++ b/doc/classes/RIR.html @@ -53,9 +53,9 @@ - + - lib/rir/string.rb + lib/rir/corpus.rb @@ -63,9 +63,19 @@
- + - lib/rir/document.rb + lib/rir/query.rb + + + + +
+ + + + + lib/rir/string.rb @@ -86,10 +96,73 @@

-General module for many purposes related to Information Retrieval. +This file is a part of an Information Retrieval oriented Ruby library +

+

+Copyright (C) 2010-2011 Romain Deveaud +

+

+This program is free software: you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the Free +Software Foundation, either version 3 of the License, or (at your option) +any later version. +

+

+This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. +

+

+You should have received a copy of the GNU General Public License along +with this program. If not, see <www.gnu.org/licenses/>.


-General module for many purposes related to Information Retrieval. +This file is a part of an Information Retrieval oriented Ruby library +

+

+Copyright (C) 2010-2011 Romain Deveaud +

+

+This program is free software: you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the Free +Software Foundation, either version 3 of the License, or (at your option) +any later version. +

+

+This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. +

+

+You should have received a copy of the GNU General Public License along +with this program. If not, see <www.gnu.org/licenses/>. +

+

+This file is a part of an Information Retrieval oriented Ruby library +

+

+Copyright (C) 2010-2011 Romain Deveaud +

+

+This program is free software: you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the Free +Software Foundation, either version 3 of the License, or (at your option) +any later version. +

+

+This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. +

+

+You should have received a copy of the GNU General Public License along +with this program. If not, see <www.gnu.org/licenses/>.

@@ -106,9 +179,9 @@ General module for many purposes related to Information Retrieval.

Classes and Modules

- Class RIR::Document
-Class RIR::WebDocument
-Class RIR::WikipediaPage
+ Module RIR::Indri
+Class RIR::Corpus
+Class RIR::Query
diff --git a/doc/classes/RIR/Document.html b/doc/classes/RIR/Document.html index e6eb41c..8643cb5 100644 --- a/doc/classes/RIR/Document.html +++ b/doc/classes/RIR/Document.html @@ -99,15 +99,15 @@ from a string.
- count_words   + count_words   - entropy   + entropy   - format_words   + format_words   - new   + new   - ngrams   + ngrams  
@@ -154,13 +154,13 @@ from a string.

Public Class methods

-
- +
+
- + new(content) @@ -177,13 +177,13 @@ from a string.

Public Instance methods

-
- +
+ -
- +
+ -
- +
+
- + ngrams(n) @@ -267,13 +267,13 @@ Returns an Array containing the n-grams (words) from the current

Protected Instance methods

-
- +
+ @@ -144,13 +144,13 @@ href="Document.html">Document with a url.

Public Class methods

-
- +
+ -
- + @@ -109,6 +123,74 @@ href="WebDocument.html">WebDocument. +
+ +

Public Class methods

+ + +
+ + + + +
+ +
+
+ + +
+ + + + +
+ +
+
+ + + + + + +
+ diff --git a/doc/classes/String.html b/doc/classes/String.html index b0d3449..076d643 100644 --- a/doc/classes/String.html +++ b/doc/classes/String.html @@ -99,7 +99,7 @@ useful function.
- extract_xmltags_values   + extract_xmltags_values   is_stopword?   @@ -109,6 +109,10 @@ useful function. strip_javascripts!   + strip_punctuation   + + strip_punctuation!   + strip_stylesheets   strip_stylesheets!   @@ -146,13 +150,13 @@ useful function.

Public Instance methods

-
- +
+ +
+ + + + +
+ +

+Removes punctuation from self. +

+
+  s = "hello, world. how are you?!"
+  s.strip_punctuation               # => "hello world how are you"
+
+ +
+
+ + +
+ + + + +
+ +

+Removes punctuation from self. +

+
+  s = "hello, world. how are you?!"
+  s.strip_punctuation!
+  s                                 # => "hello world how are you"
+
+ +
+
+ +
diff --git a/doc/classes/String.src/M000001.html b/doc/classes/String.src/M000001.html index f96e8b1..603a7ac 100644 --- a/doc/classes/String.src/M000001.html +++ b/doc/classes/String.src/M000001.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 77
+  
# File lib/rir/string.rb, line 76
   def is_stopword?
     Stoplist.include?(self.downcase)
   end
diff --git a/doc/classes/String.src/M000002.html b/doc/classes/String.src/M000002.html index 1d3aa25..c21c139 100644 --- a/doc/classes/String.src/M000002.html +++ b/doc/classes/String.src/M000002.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 83
+  
# File lib/rir/string.rb, line 82
   def remove_special_characters
     self.split.collect { |w| w.gsub(/\W/,' ').split.collect { |w| w.gsub(/\W/,' ').strip.sub(/\A.\z/, '')}.join(' ').strip.sub(/\A.\z/, '')}.join(' ')
   end
diff --git a/doc/classes/String.src/M000003.html b/doc/classes/String.src/M000003.html index 21c6728..01c1839 100644 --- a/doc/classes/String.src/M000003.html +++ b/doc/classes/String.src/M000003.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 92
+  
# File lib/rir/string.rb, line 91
   def strip_xml_tags!
     replace strip_with_pattern /<\/?[^>]*>/
   end
diff --git a/doc/classes/String.src/M000004.html b/doc/classes/String.src/M000004.html index a913161..2d020b7 100644 --- a/doc/classes/String.src/M000004.html +++ b/doc/classes/String.src/M000004.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 101
+  
# File lib/rir/string.rb, line 100
   def strip_xml_tags
     dup.strip_xml_tags!
   end
diff --git a/doc/classes/String.src/M000005.html b/doc/classes/String.src/M000005.html index 188323f..1f77395 100644 --- a/doc/classes/String.src/M000005.html +++ b/doc/classes/String.src/M000005.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 115
+  
# File lib/rir/string.rb, line 114
   def strip_javascripts!
     replace strip_with_pattern /<script type="text\/javascript">(.+?)<\/script>/m 
   end
diff --git a/doc/classes/String.src/M000006.html b/doc/classes/String.src/M000006.html index ad91df4..8a73177 100644 --- a/doc/classes/String.src/M000006.html +++ b/doc/classes/String.src/M000006.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 128
+  
# File lib/rir/string.rb, line 127
   def strip_javascripts
     dup.strip_javascripts!
   end
diff --git a/doc/classes/String.src/M000007.html b/doc/classes/String.src/M000007.html index 448264e..49c5a94 100644 --- a/doc/classes/String.src/M000007.html +++ b/doc/classes/String.src/M000007.html @@ -7,7 +7,7 @@ -
# File lib/rir/string.rb, line 132
+  
# File lib/rir/string.rb, line 131
   def strip_stylesheets!
   # TODO: rewamp. dunno what is it.
     replace strip_with_pattern /<style type="text\/css">(.+?)<\/style>/m 
diff --git a/doc/classes/String.src/M000008.html b/doc/classes/String.src/M000008.html
index 8a44d27..a10b5bd 100644
--- a/doc/classes/String.src/M000008.html
+++ b/doc/classes/String.src/M000008.html
@@ -7,7 +7,7 @@
   
 
 
-  
# File lib/rir/string.rb, line 137
+  
# File lib/rir/string.rb, line 136
   def strip_stylesheets
     dup.strip_stylesheets!
   end
diff --git a/doc/classes/String.src/M000009.html b/doc/classes/String.src/M000009.html index 2203bd0..37f6f1f 100644 --- a/doc/classes/String.src/M000009.html +++ b/doc/classes/String.src/M000009.html @@ -2,14 +2,14 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - extract_xmltags_values (String) + strip_punctuation! (String)
# File lib/rir/string.rb, line 145
-  def extract_xmltags_values(tag_name)
-    self.scan(/<#{tag_name}.*?>(.+?)<\/#{tag_name}>/).flatten
+  def strip_punctuation!
+    replace strip_with_pattern /[^a-zA-Z0-9\-\s]/
   end
diff --git a/doc/created.rid b/doc/created.rid index 0b10800..5d2582c 100644 --- a/doc/created.rid +++ b/doc/created.rid @@ -1 +1 @@ -Fri, 05 Nov 2010 15:06:41 +0100 +Tue, 23 Nov 2010 18:20:46 +0100 diff --git a/doc/files/lib/rir/document_rb.html b/doc/files/lib/rir/document_rb.html index 5dc4860..767c904 100644 --- a/doc/files/lib/rir/document_rb.html +++ b/doc/files/lib/rir/document_rb.html @@ -53,7 +53,7 @@ Last Update: - 2010-11-05 15:06:24 +0100 + 2010-11-23 18:14:13 +0100
@@ -97,6 +97,12 @@ href="http://www.gnu.org/licenses/">www.gnu.org/licenses/>. net/http   + rexml/document   + + net/http   + + kconv   +
diff --git a/doc/files/lib/rir/string_rb.html b/doc/files/lib/rir/string_rb.html index 5b47834..73f0e29 100644 --- a/doc/files/lib/rir/string_rb.html +++ b/doc/files/lib/rir/string_rb.html @@ -53,7 +53,7 @@ Last Update: - 2010-11-05 15:06:35 +0100 + 2010-11-23 18:20:41 +0100
diff --git a/doc/files/lib/rir_rb.html b/doc/files/lib/rir_rb.html index 3a8552d..d43a4b6 100644 --- a/doc/files/lib/rir_rb.html +++ b/doc/files/lib/rir_rb.html @@ -53,7 +53,7 @@ Last Update: - 2010-11-05 14:39:35 +0100 + 2010-11-19 11:27:16 +0100
@@ -72,6 +72,12 @@ rir/string   + rir/query   + + rir/corpus   + + rir/regexp   +
diff --git a/doc/fr_class_index.html b/doc/fr_class_index.html index c330122..9a24111 100644 --- a/doc/fr_class_index.html +++ b/doc/fr_class_index.html @@ -19,11 +19,15 @@ RIR
- RIR::Document
+ RIR::Corpus
- RIR::WebDocument
+ RIR::Indri
- RIR::WikipediaPage
+ RIR::Indri::IndriQuery
+ + RIR::Indri::Parameters
+ + RIR::Query
String
diff --git a/doc/fr_file_index.html b/doc/fr_file_index.html index 045567f..8871047 100644 --- a/doc/fr_file_index.html +++ b/doc/fr_file_index.html @@ -17,11 +17,11 @@

Files

diff --git a/doc/fr_method_index.html b/doc/fr_method_index.html index 0379b48..c909673 100644 --- a/doc/fr_method_index.html +++ b/doc/fr_method_index.html @@ -17,23 +17,17 @@

Methods

diff --git a/doc/index.html b/doc/index.html index 3038b39..dcf5a4f 100644 --- a/doc/index.html +++ b/doc/index.html @@ -16,6 +16,6 @@ - + diff --git a/lib/rir/corpus.rb b/lib/rir/corpus.rb index f443ec4..8428932 100644 --- a/lib/rir/corpus.rb +++ b/lib/rir/corpus.rb @@ -17,7 +17,6 @@ # You should have received a copy of the GNU General Public License # along with this program. If not, see . -# General module for many purposes related to Information Retrieval. module RIR class Corpus diff --git a/lib/rir/query.rb b/lib/rir/query.rb index 581901e..d18e297 100644 --- a/lib/rir/query.rb +++ b/lib/rir/query.rb @@ -17,7 +17,6 @@ # You should have received a copy of the GNU General Public License # along with this program. If not, see . -# General module for many purposes related to Information Retrieval. module RIR class Query diff --git a/lib/rir/string.rb b/lib/rir/string.rb index 7a95f5c..6a9b843 100644 --- a/lib/rir/string.rb +++ b/lib/rir/string.rb @@ -17,7 +17,6 @@ # You should have received a copy of the GNU General Public License # along with this program. If not, see . -# General module for many purposes related to Information Retrieval. module RIR # These are the default stopwords provided by Lemur.