Searchdaimon Es | SearchBlox | mnoGoSearch | IBM OmniFind Yahoo! Edition | Google Mini | MS Search Server | Thunderstone | Constellio | Amazon CloudSearch | Google Site Search | |
---|---|---|---|---|---|---|---|---|---|---|
Version | Auto updating | V 6.4 Build 2 | 3.3.12 | 8.4.2 | 4.6.4 | 2010 Express | App 6.01, script 8.0 | 1.2.1 | – | – |
Search page | Open | Open | Open | Open | Open | Open | Open | Open | Open | |
Indexed documents[1] | 48 801 (100%) | 34 900 (72%) | 43 859 (90%) | 36 464 (75%) | 47 135 (97%) | 38 369 (79%) | 45 141 (92%) | 43 685 (89%) | 48 737 (99.8%) | 73 900 (151%) |
Index size | 1.1G | 3.8G | 1.2G | 2.9G | – | 2.1G[2] | 3.3G | 9.6G | – | – |
Collection refiltering | X | X | X | |||||||
Misleading result count[3] | X | X | X | |||||||
Administration | Web gui | Web gui | Command line only | Web gui | Web gui | Web gui | Web gui | Web gui | Web gui, some command line | Web gui |
Platform | Virtual appliance, hardware appliance | Windows and Linux | Windows and Linux | Windows and Linux | Hardware appliance | Windows | Virtual appliance, hardware appliance | Linux | Amazon own cloud infrastructure | Google’s own cloud infrastructure |
Cost | Free open source version with community support only.
Version with full support from $1 999 to $15 000 depending on number of users and hardware options |
Free for the first 10 000 documents. Then $5 000 per server per year for a more advanced version with unlimited documents | Linux for free, Windows version from $99 to $19 850 depending on underlying database technology | Free | From $2 990 to $9 990 depending on number of documents | Free | From $990 depending on number of documents and hardware options | Free | Different search servers at $86.40, $345.60 and $489.60 per month, depending on data size and query load. You may need several in parallel if you have much data or many users. In addition there is data transfer, query count and document updating fees | $100 to $2 000+ per year depending of number of queries and on demand index quota |
Max documents | No hard limit | 10 000 for free version. No hard limit for paid version | No hard limit | 500 000 | From 50 000 to 300 000 depending on license | No hard limit | Depending on license | No hard limit | Has limit but no numbers has been published | Unknown |
Underlying search technology | Propertarian[4] | Lucene/Solr | Sql server | Lucene/Solr | Propertarian[4] | Sql server | Propertarian | Lucene/Solr, sql server | Propertarian, based on Amazon A9 | Propertarian, based on Google.com |
Review | Searchdaimon ES review | SearchBlox review | mnoGoSearch review | IBM OmniFind Yahoo! Edition review | Google Mini review | Microsoft Search Server Express 2010 review | Thunderstone review | Constellio review | Google site search review |
htdig
http://www.htdig.org/
We plan to add htdig to Open Test Search soon. Htdig is a open source search engine mostly used for websites/intranets. Is a bit outdated, with the latest release from 2004.
Easy to install. In CentOS you only need to do a “yum install htdig htdig-web”. Unfortunately you have to download an build programs from 3-party’s to convert common documents like .doc, pdf, xls etc.
Notes
[1] Indexed documents
There is a total of 48 811 documents in the two test collections. Some search engines ignores documents that they don’t have a data converter for. Ignoring thus documents means you cant search for file names of images and other not text content. Other index the file name and/or meta data.
There is also some documents with special file names that can be safely ignored. Typical starting the file name with “~” or having “#” it the name ( the # character has a special meaning when used in a url ).
[2]Estimated size
The search server don’t revile disk usage in its gui. This number is based on the size of the C:\Program Files\Microsoft Office Servers\14.0\Data\MSSQL10.SHAREPOINT folder.
[3]Misleading result count
Some search engine don’t show the correct number of found document’s. Instead that try to estimate ho many it can be. For example the Google Mini sees it have found 134 000 documents containing enron, but there is only ~50 000 documents in the data set.
[4]Propertarian search TECHNOLOGY
Neither Searchdaimon nor Google states what technology they are using under the hood, but it is assumed to be some kind if inverted index. Probably written in C or C++.