Advances in Data Mining
|Period:||Semester 1, Block I||Hours of study:||26:00 hrs|
- Yes Elective choice
- Yes Contractonderwijs
- Yes Exchange
- Yes Study Abroad
- No Evening course
- No A la Carte
- No Honours Class
Please note that this course description is preliminary. The final course description will be released in June 2018.
Elementary knowledge of data structures (sparse matrices, hash tables, graphs), statistics (binomial distribution) and combinatorics (permutations, combinations).
The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:
• processing huge (tera- or petabytes big) data sets,
• fast searching for similar objects, such as: documents, images, songs, routes, etc., in collections of millions or billions of such objects,
• clustering of massive data sets,
• real-time analysis of data streams (internet traffic, sensor data, electronic transactions),
• recommending items to visitors of internet shops,
• analyzing big (network) graphs, such as web sites, social networks, collaboration networks, etc.
During the course we will focus on these areas. We will start with introducing a powerful framework for processing massive data sets on distributed computers: Hadoop and MapReduce. Then a new, very general similarity search technique, Locality Sensitive Hashing, will be discussed, together with its applications to plagiarism detection, searching databases with fingerprints, finding clients with similar buying behavior, etc. Next, several algorithms for real-time mining of data streams will be introduced: Bloom filters, random sampling, counting, estimating moments. Finally, some state-of-the art recommendation systems will be discussed in depth. The practical part of the course will consists of several programming assignments (in Python) and writing reports.
After completing the course, the students should:
• have a general knowledge of the recent developments in the field of Data Mining
• have detailed knowledge of selected techniques and their applications
• gain some hands-on experience with several algorithms for mining complex data sets
• be able to apply the acquired knowledge and skills to new problems
• gain some experience with mining big data sets on a cluster computer
The most recent timetable can be found at the students' website.
Mode of instruction
- Computer Lab
- Practical assignments
- Self-evaluated homework
The final mark is composed of
(1) written exam (40%)
(2) practical assignment (60%)
A. Rajaraman, J. Leskovec, J. Ullman, Mining of Massive Datasets.
You have to sign up for classes and examinations (including resits) in uSis. Check this link for more information and activity codes.
Lecturer: Dr. Wojtek Kowalczyk.
|Is part of||Programme type||Semester||Block|
|Astronomy and Data Science||Master||1|
|Biology: Biodiversity and Sustainability||Master||1|
|Biology: Biology and Business Studies||Master||1|
|Biology: Biology and Education||Master||1|
|Biology: Biology and Science Communication & Society||Master||1|
|Biology: Evolutionary Biology||Master||1|
|Biology: From Cells To Organisms||Master||1|
|Biology: General Biology Programme (no research specialisation)||Master||1|
|Biology: Molecular Genetics & Biotechnology||Master||1|
|Computer Science: Computer Science and Advanced Data Analytics||Master||1|
|Statistical Science for the Life & Behavioural Sciences: Data Science||Master||1||I|