Information Retrieval and Text Analytics

Course description Information Retrieval and Text Analytics
Year: 2017-2018
Catalog number: 4343IRTA6
Teacher(s):
  • Prof.dr. W. Kraaij
  • Dr. C. Veenman
Language: English
Blackboard: Yes
EC: 6.0
Level: 500
Period: Semester 2
  • No Elective choice
  • No Contractonderwijs
  • No Exchange
  • No Study Abroad
  • No Evening course
  • No A la Carte
  • No Honours Class

Admission Requirements.

Elementary knowledge of machine learning, probability theory (Bayes’ theorem, probability calculus), linear algebra (vector spaces), data structures (hash tables) is recommended.

Description.

Search engines, the internet and cheap powerful hardware have drastically changed the way humans deal with information. Whereas thirty years ago librarians were still classifying books and articles using subject codes, nowadays search technology has become pervasive on desktop computers and mobile devices. This course covers both the theory and practice of the field of Information Retrieval and Text analytics, restricted to textual content (the courses 4343AUDIO and 4343MMIRL focus on audiovisual content).
The course covers the following aspects:
1. How can we formalize search for information and how can we evaluate search systems?
2. Which document features (e.g. term statistics) could be used to associate a ‘meaning’ to a text?
3. How can we extend the notion of relevance by looking at context and learn from interaction?
4. How can these elements be combined to classify a text or to perform relevance ranking in order to build a search engine?
5. Which data structures and techniques are essential for computational efficiency?
6. Which algorithms can be used to find entities in text, semantic relations,

Course Objectives.

By the end of the course, the student should have a thorough understanding of:

  • the principles of information retrieval models
  • the pros and cons of various query processing techniques
  • efficient data structures and complexity of search and indexing algorithms
  • technologies and relevance models for web search
  • evaluation methods for IR systems
  • algorithms to determine stylistic and subjective properties of text
  • text clustering and categorization applications
  • language models, topic models and word embeddings ( word2vec)
    In addition the student should have some experience with text processing and/or information retrieval experiments.
Timetable

The most updated version of the schedules can be found on the LIACS website.

Mode of instruction.

  • Lectures (2h / week)
  • Homework (weekly): getting more acquainted with the new lecture material by small exercises, mostly taken from the course book.
  • Practical assignments: applying lecture concepts on real-world datasets, writing a report.

Assessment method.

The course grade will be computed as follows:

  • Homework (weekly exercises) – 10%
  • Practical assignments – 30%
  • Final written exam (closed book) – 60%

Reading list.

Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan: Introduction to information retrieval, 2008,Cambridge University Press. Online version available from the authors

Additional reading assignments may be added as the course progresses, and will be made available through blackboard.

Registration.

You have to sign up for classes and examinations (including resits) in uSis. Check this link for more information and activity codes.

Contact information.

Study coordinator Computer Science, José Visse

Languages