Text Mining

Course description Text Mining
Year: 2018-2019
Catalog number: 4343TXTMN
Teacher(s):
  • dr. S. Verberne
  • A. Brandsen MSc
Language: English
Blackboard: Yes
EC: 6.0
Level: 500
Period: Semester 1
  • Yes Elective choice
  • No Contractonderwijs
  • Yes Exchange
  • Yes Study Abroad
  • No Evening course
  • No A la Carte
  • No Honours Class

Admission requirements

Bachelor Computer Science.

Description

Text mining, also known as 'knowledge discovery from text', is an ICT research and development field that has gained increasing focus in the last decade, attracting researchers from data science, computational linguistics, and machine learning. Example key applications text categorization, information extraction, social media mining and automatic summarization. This course gives an overview of the field from both a theoretical angle (underlying models) and a practical angle (applications). In addition to the lectures, the students work on practical assignments.

Course objectives

After successful completion of this course, students have an understanding, both at the conceptual and the technical level, of the application of natural language processing (NLP) in the text mining area. Students can build models for a text mining task using machine learning algorithms and language data, and they can evaluate and report on the developed models and modules. Also, students understand, from a theoretical perspective, which tools are applicable in which situations, and which real-world challenges prevent the application of certain techniques (such as language variation and noise due to document processing errors).

Timetable

The most recent timetable can be found on the students' website.

Outline
(subject to changes)
Week 1. Introduction
Week 2. the NLP pipeline
Week 3. text categorization
Week 4. distributional semantics (word embeddings and topic modeling)
Week 5. information extraction
Week 6. sentiment analysis
Week 7. information retrieval & question answering
Week 8. authorship detection
Week 9. summarization
Week 10. biomedical text mining
Week 11. industrial text mining
Week 12. conclusions/future developments

Mode of instruction

Lectures.

Assessment method

  • a written exam (60% of course grade)
  • practical assignments (40% of course grade)
    - four smaller assignments (5% each) during the course
    - one more substantial assignment (20%) at the end of the course

Reading list

The literature will be available on Blackboard week by week.

Registration

  • You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.
  • Please also register for the course in Blackboard as soon as the lecturer has made the course page available.
  • Due to limited capacity, external students can only register after consultation with the programme coordinator/study adviser (mailto:mastercs@liacs.leideuniv.nl).

Contact information

Lecturer: dr. S. Verberne & A. Brandsen MSc
Website: Lecturer's website

Languages