Course webiste for H6751
Nowadays, with the popularity of the Internet, there is a massive amount of text content available on the Web, and it becomes an important resource for mining useful knowledge. From a business and government point of view, there is an increasing need to interpret and act upon the large-volume text information. Therefore, text mining (or text analytics) is getting more attention to analyze text content on the Web. For instance, opinion mining and sentiment analysis is one of text mining techniques to analyze user-generated content on social media platforms.
This course is an introduction to text and web mining. It covers how to analyse unstructured data (i.e. text content) on the Web using text mining techniques. Students will learn various text mining techniques and tools both through lectures and hands-on exercises in labs. The course will also explore various usages of text mining techniques to real world applications. This course focuses on Web content mining, but not on Web structure and usage mining.
Students will learn following topics in the course:
At the end of this course, students should be able to:
The following books are helpful, but not required. You will easily get these books from Internet.
If you are not proficient in python, you may find some tutorials helpful.
- 2020-01-18: Welcome to H6751.
- 2020-01-16:
Group Project Team Table- 2020-01-04:
this site has been public.
We aapreciate everyone being actively involved in the class! There are serveral ways of earning participation credit, which is capped at 5%:
Attending guest speakers’ lectures: In the semester, we have two invitied speakers, who are making a great efforts to come lecture for us. We do not want them speaking to a empty room. Your attendance at lectures with guest spearks is expected! In addition, it will be a very awesome chance for networking! You will get 1% per speark (total 2%) for attending.
Instructors are going to pick students for questions during class. One point will be deducted for absence. Each student has a total of 2 points.
Karma Point: Any other act that improves the class, which instructors notics and deems worthy: 1%.
Based on the saved chat files in Zoom, the active student list is provided here with two columns: zoom id and all active comments (via regular expression and some hand-crafted rules). If you found your zoom ID is in the provided CSV, pls email the lecturer: Zhao Rui with your zoom ID in the list and your ntu student ID.
We are going to have a 90-minutes in-class assignment, which covers programming. You can refer to the template. This online assignment will test materials covered until Week 9 (Introduction to deep learning).
Check the Answer
You are required to form a project group with 3-4 members. This is a text mining project where you collect your own sample text dataset (or use an existing dataset), and using text mining techniques and tools, build an interesting model / application that mines knowledge/information from the text dataset. Generally, the project scope is entirely up to you, but I suggest that you build a useful and interesting application. Then, write a project report explaining your methodology and presenting the results and present your work in class. The detailed instructions and the guidelines for this course project could be found here. Some project ideas have been provided here
See the page for more details. And check the kaggle summary.
Class Venue: Tan Tong Meng (TTM) PC Lab CS02-35a WKWSCI Bldg
Date | Topic | Material | Assignment Due |
---|---|---|---|
Sat a.m 01/18 | Introduction to Text Mining | LINK | N.A. |
Sat a.m 02/01 | Pre-processing for Text Mining I | LINK | N.A |
Sat p.m 02/01 | Pre-processing for Text Mining II | LINK | Form a Group |
Sat a.m 02/15 | Text Categorization I | LINK | E-learning |
Sat p.m 02/15 | Text Categorization II | LINK | Project Proposal Submission |
Sat a.m 02/29 | Text Categorization III | LINK | N.A. |
Sat p.m 02/29 | Document Clustering | LINK | N.A. |
Sat a.m 03/21 | Sentiment Analysis | LINK | N.A. |
Sat p.m 03/21 | Introduction to Deep Learning | LINK | Kaggle Starts |
Sat a.m 04/04 | Word Embeddings | LINK | Guest Speaker: Li Pengfei |
Sat p.m 04/04 | Recurrent Neural Network | LINK | Kaggle Ends |
Sat a.m 04/18 | Convolutional Neural Network | LINK | Guest Speaker: Weng Quanchi slides |
Sat p.m 04/18 | Course Summary | N.A. | In-class Assignment (online) |
Sat p.m 05/02 | N.A | N.A. | Project Paper & Recorded Video Submission |