Amazon CloudSearch demo

I have added Amazon CloudSearch to Open Test Search so we can compare it to other search technologies.

Check it out: http://www.opentestsearch.com/cgi-bin/search.cgi?query=enron

CloudSearch was able to index 48,737 (99.8%) documents in the collection. The rest of the documents was rejected because they, according to Amazon, contains illegal Unicode characters. The end user interface you see has been created by me in Perl.

The instance type has automatically been scaled up to a “search.m1.large” search instance, and cost 345,60$ per month (0,48$ per hour x 24 hrs per day x 30 days).

MS Server Server skips documents

It looks like Microsoft Search Server Express don’t search in the whole blog document collection when it is many cheap mlb jerseys results. Rowing For example for the query enron almost Leparmentier’s all the other engines have wholesale jerseys the Enron scandal document on it’s first page. But not Microsoft Search Server.

Looking at the in result type filter, selecting something other Гимнастика then Word as result type isn’t even possible.

But searching wholesale mlb jerseys for enron scandal directly shows results from out both webpages, Word files, text and PowerPoint.

Don’t know if this is a bug or by design, but I can imagine that failing to return all documents for common words will of cause problem for people with large document collection.

Google Mini filter out relevant results

The Mini throws the message “In Süperbahis order to show you the most relevant results, we have omitted Rules. some entries very similar to the in 4 already displayed. If you like, you can repeat the search wholesale NFL jerseys with the omitted results included.” for every query, even when the Wholesale Miami Dolphins Jerseys results is not similar at all.

For example for my enron query the Mini shows this:

The rest post of communications, the 60 796 documents it claims it found can’t all be almost equal to the first 4?

To make the matter worst, when clicking on the link to see the omitted results, the most relevant document, wholesale mlb jerseys the Enron_scandal.html is not shown at all.

Search engines in the works

Currently I have htdigOpenSearchServer, Xapian Omega, Constellio and Flax on the list of search engines to consider. The Oracle Secure Enterprise Search also looks promising, wholesale jerseys but cheap nfl jerseys there may be some license wholesale mlb jerseys restrictions that The prevent me from using the downloadable demo to cheap jerseys creating a public абдоминальный demo.

If anyone know about any other enterprise search engines that can be setup for free, don’t hesitate to drop my a Price line on runarb [at] gmail wholesale nba jerseys [dot] com .