Data Structures and Algorithms for Information Retrieval
Instructors: Tapas Kanungo and David Mount
Presentation Schedule and Slides
We will discuss various data structures and algorithms that are
used in information retrieval systems. The emphasis will be on
understanding the details of the algorithms, the data structures
used, the space-time complexities. We will also attempt to propose
alternate algorithms that are more efficient than the ones used
in the literature.
Students are responsible for carrying out the following activities.
- Select one or a few articles or chapters from the books. Discuss
your choices with either or both of the course coordinators. Ideally
you should make your articles available for copying at least one day
prior to the previous class period, so that copies can be distributed
- Give a one hour presentation discussing the details of algorithm.
The first 15 minutes should be spent on the background and
introduction, and the rest of the time should be spent on technical
details and methods.
- Lead a discussion on the paper presented: What was the key problem?
Are there other ways of addressing this problem? Are there faster
methods? What are the strengths and weaknesses of the paper? What
other improvements or tradeoffs are there?
- Submit a summary report of roughly four pages with a bibliography
citing related research. The report should summarize the paper and
the discussion. The document should be in Latex and the bibliography
should be in Bibtex. We plan on combining these summaries at the
end of the semester to form a survey to the area and a bibliography.
Samples will be given to help in preparing these.
The following list is intended to serve as a starting point for ideas
for topics. Feel free to suggest others that you think may be
interesting and appropriate. Inverted files, lexical analysis and stop
lists, stemming algorithms; string searching algorithms; clustering
algorithms; kd trees, nearest neighbor searching; singular value
decomposition/latent semantic indexing; probabilistic searching;
hidden Markov models (HMMs) and HMM-based searching; searching scanned
documents, photographs, video and speech; searching in the compressed domain.
- Modern Information Retrieval
Ricardo Baeza-Yates and Berthier Ribeiro-Neto
Addison Wesley, 1999.
- Information Retrieval: Data Structures and Algorithms
William B. Frakes and Ricardo Baeza-Yates (eds.)
Prentice Hall, 1992.
Code for Chapters 7, 8, 9, 12, and 13
- Managing Gigabytes: Compressing and Indexing Documents and Images
Ian H. Witten, Alistair Moffat and Timothy C. Bell
Van Nostrand Reinhold, 1994.
- Readings in Information Retrieval
Karen Sparck Jones and Peter Wilett (eds.)
Morgan Kaufmann, 1997.
- Statistical Pattern Recognition
Academic Press, 1990
- Statistical Pattern Recognition: A Review
A. K. Jain, R. Duin and J. Mao
IEEE Trans. on PAMI, January 2000
Tech Report version
- The ACM SIGIR conference proceedings website:
Grades will be based on the presentation and class participation.
This is a 2-credit seminar. However, if a student is interested in
doing a project he/she can earn 3 credits. Please contact one of
the coordinators to discuss possible projects.
Meeting Time and Place
|Day ||Time ||Room |
|Tuesday || 3:30- 5:30 || AVW 4406 |
Presentation Schedule and Slides
Copy all the slides (tar+gz)