H6751 Text and Web Mining Spring 2020

Course webiste for H6751

View My GitHub Profile

H6751 Web and Text Mining

NTU, WKW / Spring 2020

Course Description

Nowadays, with the popularity of the Internet, there is a massive amount of text content available on the Web, and it becomes an important resource for mining useful knowledge. From a business and government point of view, there is an increasing need to interpret and act upon the large-volume text information. Therefore, text mining (or text analytics) is getting more attention to analyze text content on the Web. For instance, opinion mining and sentiment analysis is one of text mining techniques to analyze user-generated content on social media platforms.

This course is an introduction to text and web mining. It covers how to analyse unstructured data (i.e. text content) on the Web using text mining techniques. Students will learn various text mining techniques and tools both through lectures and hands-on exercises in labs. The course will also explore various usages of text mining techniques to real world applications. This course focuses on Web content mining, but not on Web structure and usage mining.

Students will learn following topics in the course:

Contact Information:

Course Objectives:

At the end of this course, students should be able to:

Prerequisites:

Reference Books

The following books are helpful, but not required. You will easily get these books from Internet.

If you are not proficient in python, you may find some tutorials helpful.

Announcement

Assessment

Class Participation (5%)

We aapreciate everyone being actively involved in the class! There are serveral ways of earning participation credit, which is capped at 5%:

  1. Attending guest speakers’ lectures: In the semester, we have two invitied speakers, who are making a great efforts to come lecture for us. We do not want them speaking to a empty room. Your attendance at lectures with guest spearks is expected! In addition, it will be a very awesome chance for networking! You will get 1% per speark (total 2%) for attending.

  2. Instructors are going to pick students for questions during class. One point will be deducted for absence. Each student has a total of 2 points.

  3. Karma Point: Any other act that improves the class, which instructors notics and deems worthy: 1%.

Based on the saved chat files in Zoom, the active student list is provided here with two columns: zoom id and all active comments (via regular expression and some hand-crafted rules). If you found your zoom ID is in the provided CSV, pls email the lecturer: Zhao Rui with your zoom ID in the list and your ntu student ID.

Individual In-class Assignment (25%)

Group Project (40%)

You are required to form a project group with 3-4 members. This is a text mining project where you collect your own sample text dataset (or use an existing dataset), and using text mining techniques and tools, build an interesting model / application that mines knowledge/information from the text dataset. Generally, the project scope is entirely up to you, but I suggest that you build a useful and interesting application. Then, write a project report explaining your methodology and presenting the results and present your work in class. The detailed instructions and the guidelines for this course project could be found here. Some project ideas have been provided here

In-class Kaggle Competition(30%)

See the page for more details. And check the kaggle summary.

Schedule

Class Venue: Tan Tong Meng (TTM) PC Lab CS02-35a WKWSCI Bldg

Date Topic Material Assignment Due
Sat a.m 01/18 Introduction to Text Mining LINK N.A.
Sat a.m 02/01 Pre-processing for Text Mining I LINK N.A
Sat p.m 02/01 Pre-processing for Text Mining II LINK Form a Group
Sat a.m 02/15 Text Categorization I LINK E-learning
Sat p.m 02/15 Text Categorization II LINK Project Proposal Submission
Sat a.m 02/29 Text Categorization III LINK N.A.
Sat p.m 02/29 Document Clustering LINK N.A.
Sat a.m 03/21 Sentiment Analysis LINK N.A.
Sat p.m 03/21 Introduction to Deep Learning LINK Kaggle Starts
Sat a.m 04/04 Word Embeddings LINK Guest Speaker: Li Pengfei
Sat p.m 04/04 Recurrent Neural Network LINK Kaggle Ends
Sat a.m 04/18 Convolutional Neural Network LINK Guest Speaker: Weng Quanchi slides
Sat p.m 04/18 Course Summary N.A. In-class Assignment (online)
Sat p.m 05/02 N.A N.A. Project Paper & Recorded Video Submission