CMPSCI 745: Advanced Database Systems


A paper review should summarize concisely the main contributions of the paper. It should also identify the limitations of the paper, if any. Please feel free to include any questions, for example, on an assumption made in the paper that you found unrealistic, or any technique that is unclear to you.


Please email paper reviews to the instructor by 10 am on the day of class. Please send reviews in the body of the message. Do not send them as attachments. Please make sure to have the email title "745 PAPER REVIEW", as such email messages wil be automatically collected into an email folder for this class. The paper reviews cover 25% of the course grade.

Reading list (subject to change)

1. Data Warehouses

[CD97] Surajit Chaudhuri and Umeshwar Dayal. An Overview of Data Warehousing and OLAP Technology. SIGMOD Record, 26(1), 1997, 65-74.

[GCB+97] Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Min. Knowl. Discov., 1(1), 1997, 29-53.

[OQ97] Patrick E. O'Neil and Dallan Quass. Improved Query Performance with Variant Indexes. Proc. SIGMOD Conference, 1997, 38-49.

[ZDN97] Yihong Zhao, Prasad Deshpande, and Jeffrey F. Naughton. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. Proc. SIGMOD Conference, 1997, 159-170.

2. Data Mining

[AS94] Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proc. VLDB, 1994, 487-499.

[ZRL96] Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proc. SIGMOD Conference, 1996, 103-114.


[Hal01] A.Y. Halevy. Answering Queries Using Views: A Survey. VLDB Journal, 10(4), 2001.

[KR99] Yannis Kotidis and Nick Roussopoulos. DynaMat: A Dynamic View Management System for Data Warehouses. Proc. of SIGMOD Conference, 1999, 371-382. (Best Paper)

4. Column-based Databases

[SAB+05] Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS. VLDB, 2005, 553-564.

[AMMH07] Daniel J. Abadi, Adam Marcus, Samuel Madden, and Katherine J. Hollenbach. Scalable Semantic Web Data Management Using Vertical Partitioning. Proc. of VLDB 2007, 411-422. (Best Paper)

5. Sequential, Temporal, Stream Databases

[BGJ] Michael H. Bohlen, Johann Gamper, and Christian S. Jensen. Temporal Databases. Chapter of a book edited by Hammer and Schneider.

[SLR96] Praveen Seshadri, Miron Livny, Raghu Ramakrishnan. The Design and Implementation of a Sequence Database System. Proc. VLDB, 1996, 99-110.

[ABW06] Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2): 121-142 (2006)

[HCH+99] Eric N. Hanson, Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J. B. Park, and Albert Vernon. Scalable Trigger Processing. Proc. ICDE, 1999, 266-275.


[CDTW00] Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. Proc. SIGMOD Conference, 2000, 379-390.

[AH00] Ron Avnur and Joseph M. Hellerstein. Eddies: Continuously Adaptive Query Processing. Proc. SIGMOD Conference, 2000, 261-272.

6. Parallel Databases

[DGS+90] David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng., 2(1), 1990, 44-62.

[Gra90] Goetz Graefe. Encapsulation of Parallelism in the Volcano Query Processing System. Proc. of SIGMOD, 1990, 102-111.

7. Big Data Systems

[DG04] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Proc. of OSDI (Symposium on Operating System Design and Implementation) 2004.

[CDG+06] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data. Proc. OSDI (Symposium on Operating System Design and Implementation) 2006.

Selected Big Data papers for presentation

Lecture 1: SQL on MapReduce

[ZBC+12] Jingren Zhou, Nicolas Bruno, Ming-Chuan Wu, PerAke Larson, Ronnie Chaiken, Darren Shakib. SCOPE: parallel databases meet MapReduce. The VLDB Journal, Volume 21 Issue 5, October 2012, Pages 611-636.

[CLL+11] Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon, Michael Wong. Tenzing a SQL implementation on the mapreduce framework. PVLDB, 4(12):1318-1327, 2011. (Optional)

Lecture 2: Theory on MapReduce

[TLX+13] Yufei Tao, Wenqing Lin, Xiaokui Xiao. Minimal MapReduce Algorithms. Proceedings of ACM Conference on Management of Data (SIGMOD), pages 529-540, 2013.

[KSV+10] Howard Karloff , Siddharth Suri , Sergei Vassilvitskii. A model of computation for MapReduce. Proceeding SODA '10 Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, Pages 938-948, 2010. (Optional)

Lecture 3: Low-Latency Systems

[ZCD+12] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012, Apr. 2012. (Best Paper)

[MGL+10] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. Proceedings of the VLDB Endowment, Volume 3 Issue 1-2, September 2010 Pages 330-339.

Lecture 4: New Services

[OCS+09] Christopher Olston, Shubham Chopra, and Utkarsh Srivastava. Generating Example Data for Dataflow Programs. Proceedings of ACM Conference on Management of Data (SIGMOD), 2009. (Best Paper)