Capturing Data Uncertainty in High-Volume Stream Processing
University of Massachusetts, Amherst
The goal of this project is to design and develop a stream processing system that captures data uncertainty from data collection to query processing to final result generation. This project takes a principled approach grounded in probability and statistical theory to support uncertainty as a first-class citizen, and efficiently integrate this approach into high-volume stream processing. The project has two main contributions:
- The first contribution of the project is to capture uncertainty of raw data streams emanating from sensing devices. Since the raw streams may not present data in a format suitable for query processing and can be highly noisy, this project employs probabilistic models of the underlying data generation process and machine learning techniques to efficiently transform raw data into a desired representation with an uncertainty metric.
- The second contribution is to capture uncertainty as data propagates through various query processing operators. To efficiently quantify result uncertainty of a query operator, this project explores various techniques based on probability and statistical theory to reduce statistics that data streams need to carry and to expedite the computation of result distributions using approximation.
CLARO project web page
Project Members
- Yanlei Diao (faculty)
- Anna Liu (faculty)
- Michael Zink (faculty)
- Thanh Tran (grad)
- Liping Peng (grad)
- Boduo Li (grad)
Sponsors
|
National Science Foundation |
|
III-COR-small: Capturing Data Uncertainty in High-Volume Stream Processing.
Yanlei Diao (PI) and Anna Liu (co-PI).
National Science Foundation IIS-0812347.
Award abstract.
Any opinions, findings, and conclusions or recommendations expressed at this web site are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. |