CMPSCI 645: Database Design and Implementation

Course Project


CMPSCI645 offers a course project, in one of the two forms:

Reproducibility: Students will work in teams of three. Each team chooses one topic from a list of recommended papers. The team is expected to read the paper, write a paper review, reimplement some of the existing techniques, and reproduce a set of critical results on a given dataset.

Open-Ended Research: This option is reserved only for students with prior research experience, including students who are already admitted into the PhD program or students who have obtained a prior publication. In this project, students work individually or in a team of two. Students will be given a recent paper as the technical background and expected to propose and evaluate new techniques that aim to significantly improve functionality and performance of a data analytics system.


1. Reproducibility Project

The course project is a collaborative assignment, which is due at the end of the semester. Students will be working in teams of 3. This is a self-guided assignment, designed to give you some exposure to and practice with recent results in big data analytics. The project includes a written review of a recent research paper on big data analytics (to be chosen from a list provided by the 645 staff), reproducing partial results from the paper, and proposing and implementing an extension based on your own idea.

Additional detailed instructions will be posted soon...

Milestones


Form groups (Due Tuesday, March 23, 5 pm)

Find project partners and begin to discuss project ideas.

Paper review and project proposal (Due Thursday, April 1, 5 pm)

Each project group is expected to turn in a report with the following sections:
  1. The paper selected for the course project and a thorough review of the paper, including (a) a problem statement; (b) the main approach and most important techniques; (c) how well the experimental results show that the proposed techniques solve the problem presented at the beginning of the paper; (d) the limitations of the techniques in the paper.
  2. Besides the basic requirements of the project, including the algorithms to implement and results to (re)produce, please propose one idea as an extension of the project.

Project Presentation (Tuesday April 27 and Thursday April 29, in class)

The presentation is required only for the team projects, which are expected to include an extension beyond the material already presented in the paper. Individual projects do not need to make a presentation.

Final Report (Thursday, May 6, 5 pm)

A final report extends your previous writeup to
  1. present the research problem (Section 1: Problem Statement),
  2. describe the main approach and techniques from the paper that were implemented in this project, as well as necessary changes made (e.g., in data preprocessing, hyperparameter tuning) in this project (Section 2: Techniques Implemented),
  3. describe evaluation methodology and significant results achieved (Section 3: Experimental Results),
  4. describe a major extension beyond the material already presented in the paper, and the results obtained (Section 4: Extension and Results),
  5. summarize our work and present your conclusions (Section 5: Summary),
  6. for team work, the report should also include a paragraph explaining, for each group member, their contributions and duties in the project (Section 6: Team Contributions).


2. Open-Ended Research Project

This option is reserved for students with prior research experience, including students who are already admitted into the PhD program or students who have obtained a prior publication. In this project, students work individually or in a team of two. Since a research project takes more time than a reproducibility project, the student who chooses the research project will have homework 5 and 6 waived. In other words, a research project = a reproducibility project + homework 5 + homework 6.

Students who are interested in the research project should email the instructor by Thursday, February 18.

Students will be given the following paper as the technical background and expected to propose and evaluate new techniques that aim to significantly improve the functionality and performance of a data-driven adaptive learning system.

Data-Driven Adaptive Learning Systems: A broad range of machine learning (ML) applications are being used to improve user experience in digital environments. However, as data evolves over time, its characteristics typically diverge from the training data originally used to build an ML model, and so a non-adaptive approach will usually result in a loss of accuracy over time, and hence user trust in the model. The goal of a data-driven adaptive learning system is to rapidly adapt to such changing circumstances. The 3V's of big data, Veracity, Volume and Velocity, present major challenges to adaptive learning systems: when large volumes of evolving data arrive at high speeds, real-time adaptation becomes highly nontrivial. In this project we aim to address research related to several challenges in adaptive learning systems:
  1. how to best summarize the properties (e.g., using conformance constraints) of the training data that was used to build a machine learning model;
  2. how to determine whether model accuracy has degraded, or potentially will degrade, to the extent that we need to update the current model;
  3. if we need to retrain or update the model, how to use data synopses to speed up this process.

Students can explore a range of advanced techniques including data drift detection, model sensitivity analysis, and efficient model update, in particular, for high-volume data streams.

Project selection (Due February 18, by email to the instructor)


Project proposal (Due Thursday, March 4)

Your proposal should explicitly state the following:
  1. Problem your project will address.
  2. Your project's goal and motivation.
  3. Your survey of the related work.
  4. The methodology and plan for your project. Be sure to structure your plan for the project as a set of incremental milestones and include a schedule for meeting them.
  5. The resources needed to carry out your project.

Status reports (Due Thursday, March 25 and Thursday, April 22)

Your status report should contain enough implementation, data, and analysis to show that your project is on the right track. You should revise your original proposal to accommodate the instructor's comments, along with any surprising results or changes in direction, schedule, etc. You sometimes also need to have a refined version of the problem statement as well as a more developed related work section.

Project presentation (Tuesday, May 4, in class)

A brief presentation should include the proposed problem, state-of-the-art solutions, your proposed solutions including the algorithms and implementation, and evaluation results. The presentation may include a system demo if appropriate.

Project report (Thursday, May 6, by 5 pm)

A final report extends your previous writeups to
  1. present the research problem and summarize your contributions (in the first section),
  2. survey related work (in the related work section),
  3. include a detailed description of your algorithm, analysis, and implementation (in the technical section),
  4. describe evaluation methodology and significant results (in the evaluation section), and
  5. finally present your conclusions (in the summary section).
  6. For team work, the report should also include a paragraph explaining, for each group member, their contributions and duties in the project.