Search engines

 

Searchdaimon Es SearchBlox mnoGoSearch IBM OmniFind Yahoo! Edition Google Mini MS Search Server Thunderstone Constellio Amazon CloudSearch Google Site Search
Version Auto updating V 6.4 Build 2 3.3.12 8.4.2 4.6.4 2010 Express App 6.01, script 8.0 1.2.1
Search page Open Open Open Open Open Open Open Open Open
Indexed documents[1] 48 801 (100%) 34 900 (72%) 43 859 (90%) 36 464 (75%) 47 135 (97%) 38 369 (79%) 45 141 (92%) 43 685 (89%) 48 737 (99.8%) 73 900 (151%)
Index size 1.1G 3.8G 1.2G 2.9G 2.1G[2] 3.3G 9.6G
Collection refiltering X X X
Misleading result count[3] X X X
Administration Web gui Web gui Command line only Web gui Web gui Web gui Web gui Web gui Web gui, some command line Web gui
Platform Virtual appliance, hardware appliance Windows and Linux Windows and Linux Windows and Linux Hardware appliance Windows Virtual appliance, hardware appliance Linux Amazon own cloud infrastructure Google’s own cloud infrastructure
Cost Free open source version with community support only.

Version with full support from $1 999 to $15 000 depending on number of users and hardware options

Free for the first 10 000 documents. Then $5 000 per server per year for a more advanced version with unlimited documents Linux for free, Windows version from $99 to $19 850 depending on underlying database technology Free From $2 990 to $9 990 depending on number of documents Free From $990 depending on number of documents and hardware options Free Different search servers at $86.40, $345.60 and $489.60 per month, depending on data size and query load. You may need several in parallel if you have much data or many users. In addition there is data transfer, query count and document updating fees $100 to $2 000+ per year depending of number of queries and on demand index quota
Max documents No hard limit 10 000 for free version. No hard limit for paid version No hard limit 500 000 From 50 000 to 300 000 depending on license No hard limit Depending on license No hard limit Has limit but no numbers has been published Unknown
Underlying search technology Propertarian[4] Lucene/Solr Sql server Lucene/Solr Propertarian[4] Sql server Propertarian Lucene/Solr, sql server Propertarian, based on Amazon A9 Propertarian, based on Google.com
Review Searchdaimon ES review SearchBlox review mnoGoSearch review IBM OmniFind Yahoo! Edition review Google Mini review Microsoft Search Server Express 2010 review Thunderstone review Constellio review Google site search review

htdig

http://www.htdig.org/
We plan to add htdig to Open Test Search soon. Htdig is a open source search engine mostly used for websites/intranets. Is a bit outdated, with the latest release from 2004.

Easy to install. In CentOS you only need to do a “yum install htdig htdig-web”. Unfortunately you have to download an build programs from 3-party’s to convert common documents like .doc, pdf, xls etc.

Notes

[1] Indexed documents

There is a total of 48 811 documents in the two test collections. Some search engines ignores documents that they don’t have a data converter for. Ignoring thus documents means you cant search for file names of images and other not text content. Other index the file name and/or meta data.

There is also some documents with special file names that can be safely ignored. Typical starting the file name with “~” or having “#” it the name ( the # character has a special meaning when used in a url ).

[2]Estimated size

The search server don’t revile disk usage in its gui. This number is based on the size of the C:\Program Files\Microsoft Office Servers\14.0\Data\MSSQL10.SHAREPOINT folder.

[3]Misleading result count

Some search engine don’t show the correct number of found document’s. Instead that try to estimate ho many it can be.  For example the Google Mini sees it have found 134 000 documents containing enron, but there is only ~50 000 documents in the data set.

[4]Propertarian search TECHNOLOGY

Neither Searchdaimon nor Google states what technology they are using under the hood, but it is assumed to be some kind if inverted index. Probably written in C or C++.