Information Retrieval
- 21
- Carrot2
- A search results clustering framework. Includes clustering components and a stand-alone meta search component.
- 22
- Dowser
- Clusters results from major search engines, associates words that appear in previous searches, and keeps a local cache of all the results you click on in a searchable database. It helps keep track of what you find on the web. [Windows, UNIX, Linux and Mac OSX.]
- 23
- Apache Lucene
- Search engine library with many features including fast indexing, ranked searching, boolean, phrase, and span queries, date-range searching, and extension APIs. News, details of sub-projects, documentation, and downloads. [Open Source]
- 24
- Compass
- An open-source project built on top of Lucene aiming at simplifying the integration of search into any Java application. Overview, download, documentation and support.
- 25
- Lucene Tutorial.com
- A site for programmers to Apache Lucene, including tutorials and sample code, as well as troubleshooting guides.
- 26
- Nutch
- Open-source web-search software, built on Lucene Java. Project news, documentation, and download.
- 27
- Solr
- An open-source search server based on the Lucene Java search library. News, documentation, resources, and download.
- 28
- Supermind Consulting
- This firm provides Lucene, Solr and Nutch consulting services. Focus is on vertical search and crawling.
- 31
- Adaptive On-Line Page Importance Computation
- A good explanation about the convergence of various algorithms. This paper also describes an adaptive and on-line algorithm for computing the page importance. It can be used for focus crawling as well as for search engine's ranking.
- 32
- SALSA: The Stochastic Approach for Link-Structure Analysis
- A focused search algorithm (SALSA) based on Markov chains. It starts with a query on a broad topic, discards useless links, and then weights the remaining terms. A stochastic crawl is used to discover the authorities on this topic. [PS format]
- 33
- The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
- This paper describes a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. [PDF]
- 34
- Web-Trec 8 and PageRank
- About the using of PageRank in Web Track 8 "large" and "small" datasets. [PDF]
page #2