|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?注册
x
[size=1em]CSCE 670 :Information Storage and Retrieval Spring 2014Tues/Thurs 12:45-2:00pm in HRBB 113Instructor: James Caverlee, HRBB 403Office Hours: Tues 4-5pm, or by appointmentDepartment of Computer Science and EngineeringTexas A&M University
TA: Haokai Lu, 408AOffice Hours: Mon/Wed 4-5pm
Course Summary
In this course, we'll study the theory, design, and implementation of text-based and Web-based information retrieval systems, including an examination of web and social media mining algorithms and techniques at the core of modern search and data mining applications. By the end of the semester you will be able to:- Define and explain the key concepts and models relevant to information storage and retrieval, including efficient text indexing, boolean, vector space and probabilistic retrieval models, relevance feedback, document clustering and text categorization, Web search, including crawling, indexing, and link-based algorithms like PageRank.
- Design, implement, and evaluate the core algorithms underlying a fully functional web search / data mining system, including the indexing, retrieval, and ranking components, as well as advanced algorithms like document clustering and text categorization.
- Identify the salient features and apply recent research results in web search and data mining, including topics such as collaborative filtering, adversarial information retrieval, location-based services, and social information management.
Communication
All course communication will be via Piazza. We will post often to Piazza, so you should plan to check it often (every day). Prerequisites
I expect all students to have had some previous exposure to basic probability, statistics, algorithms, and data structures. You should be able to design and develop large programs and learn new software libraries on your own. Textbooks
The primary textbook is IIR: Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. Available at Cambridge University Press, at Amazon, and other fine booksellers.
We'll also read some selections from:- MMD: Mining of Massive Datasets, Anand Rajarman and Jeffrey D. Ullman.
- DITP: Data-Intensive Text Processing with MapReduce, by Lin and Dyer, 2010.
- NCM: Networks, Crowds, and Markets: Reasoning About a Highly Connected World, David Easley and Jon Kleinberg, Cambridge University Press. 2010.
- As well as several papers and other resources provided in the course schedule (with links).
You may find some of these optional textbooks helpful, though none are required:- Modern Information Retrieval, by Baeza-Yates and Ribeiro-Neto.
- Managing Gigabytes, by Witten, Moffat, and Bell.
- Foundations of Statistical Natural Language Processing, by Manning and Schutze.
- Search Engines: Information Retrieval in Practice, by Croft, Metzler, and Strohman.
It is critically important that you study the relevant course readings before class so that we can make the most of our limited class time together. I treat our class meetings as opportunities to highlight significant aspects of the material, to answer questions, to engage in discussions about particular topics, and so on. We cannot cover all of the material in class, so it is up to you to stay on top of the readings and the assignments. |
|