CMPSCI 645: Database Design and Implementation

Course Project


The course project is a collaborative assignment, which is due at the end of the semester. Students will be working in teams of 3. This is a self-guided assignment, designed to give you some exposure to and practice with recent results in big data analytics. The project includes a written review of a recent research paper on big data analytics (to be chosen from a list provided by the 645 staff), reproducing partial results from the paper, and proposing and implementing an extension based on your own idea.

Your project will be based on one of the following papers:

  1. Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang. "Extracting Top-K Insights from Multi-dimensional Data." In Proceedings of the ACM Conference on Management of Data (SIGMOD), Chicago, Illinois, USA, May 2017.
    Project description: pdf, individual_pdf

  2. Haopeng Zhang, Yanlei Diao, Alexandra Meliou. "EXstream: Explaining Anomalies in Event Stream Monitoring." In Proceedings of the 20th International Conference on Extending Database Technology (EDBT), 156-167, 2017.
    Project description: pdf , pdf individual

  3. D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang. "Automatic Database Management System Tuning Through Large-scale Machine Learning." In Proceedings of the 2017 ACM International Conference on Management of Data, 2017, pp. 1009-1024.
    Project description: pdf , dataset , review-guide

Additional detailed instructions will be posted soon...

Milestones


Form groups (Due Friday, March 27, 5 pm)

Find project partners and begin to discuss project ideas.

Paper review and project proposal (Due Friday, April 3, 5 pm)

Each project group is expected to turn in a report with the following sections:
  1. The paper selected for the course project and a thorough review of the paper, including (a) a problem statement; (b) the main approach and most important techniques; (c) how well the experimental results show that the proposed techniques solve the problem presented at the beginning of the paper; (d) the limitations of the techniques in the paper.
  2. Besides the basic requirements of the project, including the algorithms to implement and results to (re)produce, please propose one idea as an extension of the project.


Project Presentation (Monday April 27 and Wednesday April 29, in class)

The presentation is required only for the team projects, which are expected to include an extension beyond the material already presented in the paper. Individual projects do not need to make a presentation.


Final Report (Friday, May 8th, 5 pm)

A final report extends your previous writeup to
  1. present the research problem (Section 1: Problem Statement),
  2. describe the main approach and techniques from the paper that were implemented in this project, as well as necessary changes made (e.g., in data preprocessing, hyperparameter tuning) in this project (Section 2: Techniques Implemented),
  3. describe evaluation methodology and significant results achieved (Section 3: Experimental Results),
  4. describe a major extension beyond the material already presented in the paper, and the results obtained (Section 4: Extension and Results),
  5. summarize our work and present your conclusions (Section 5: Summary),
  6. for team work, the report should also include a paragraph explaining, for each group member, their contributions and duties in the project (Section 6: Team Contributions).