H6751 Text and Web Mining Spring 2020

Course webiste for H6751

View My GitHub Profile

H6751 In-class Kaggle Competition

We are hosting a in-class Kaggle competition. The competition is a text categorization problem, i.e., labeling natural language texts with relevant categories from a predefined set. Via assigning online content into categories, users can easily search and navigate within website or application. This competition is centered around one natural language processing application and will also deepen your understanding of the machine learning concepts.

What will the competition be like?

How will we be graded?

1. Kaggle competition private leaderboard ranking (15%)

Half of your grade for this Kaggle competition will be determined by your ranking on the private leaderboard.

You will be awarded 5 points out of 15 available in this component if your submission outperforms our very simple baseline.

The other 10 points will be awarded based on your relative ranking.

2. Report (15%)

The other half of your grade is accounted for by a report to be submitted a week after the Kaggle competition closes, i.e. 18 April 2020 17:00 Singapore time.

Name your report with your NTU Student Number, i.e. G0123456Z.pdf

Your report must include the following 3 sections describing:

2.1 Data preprocessing, feature engineering, and how you decided on this.

2.2 How you performed validation of your predictions.

2.3 The models you explored, why they were chosen, and how you arrived at the final model for your top 2 submissions.

There is a 2 page limit on substantive content comprising the above sections, and any such content exceeding this limit will be ignored.

You may choose to include tables and graphs in an appendix. There is a 2 page limit on any appendix you may choose to include.

Finally, submit a Jupyter notebook (.ipynb) for the final model for your top submission, together with your report.

Name this file with your NTU Student Number, i.e. G0123456Z.ipynb

The file should include your code for data preprocessing, feature engineering, prediction validation, and generating your .csv submission.

The submission folders of the report and notebook are NTULEARN > Kaggle Report and NTULEARN > Kaggle Notebook, respectively.