Reading Tips
Ask the following questions while readings
- What is the context of this work?
- What was the unmet need or opportunity? Does it make sense?
- What were existing approaches and why do they work or not work?
- What is the simplest example that highlights the problem that this approach works best for?
- Does the paper (and its contributions) matter?
- What are the actual hypotheses?
- Approach
- How do they seek to validate their hypotheses? Do they make sense?
- Is the evaluation cursory or deep?
- Do you believe their results?
- Are the results presented well?
Papers on how to read papers
Some papers on reviewing papers
The Papers
Review
Required
Further Reading
Indexes
Required
Further reading
- Classic Indexes
- Generating New Index Designs using ML
- Adaptive and Learned Indexes
Joins
Required
Further reading
- Large-than-memory Joins
- Adaptive Joins
- Distributed Joins
Query Optimization
Readings
Further reading
Cost Estimation
Required
Further Reading
Main-memory Databases (Vectorization)
Required (Edited 2/16: swapped the background and main topic papers)
Further Reading
Some things to think about when reading
- For disk-based systems, when would query compilation be useful?
Main-memory Databases (Compilation)
Required
Further reading
Data Flow
Required
Further Reading
- Classics
- Big Data Era
- More Timely Dataflow
Some notes to guide your reading and thinking.
- This paper focuses primarily on what Naiad provides and how it works. But which are actually necessary? Which are nice-to-haves? Why? How does it contrast with other papers we have read (spark, spark streaming, recursive queries)? Can it do things other systems cannot?
- What makes this paper easy or hard to read? Note those, and other questions you have, in the comments, and we can discuss in class
- We will go over some of the technical details of how execution operates
Incrementally Maintaining Materialized Views
Required:
Further Reading
In-DBMS ML
Required
Further Reading
In-DBMS ML over Joins
Required
Further Reading
Hybrid Caching/UDFs
Required
For this week, you do NOT need to create slides for your team roles. Instead, your team will play around with Convex, relate it with concepts from the readings, and prepare questions and observations to share with Sujay before his presentation. Specifically
- Install and use convex to build an application and deploy it (say, on vercel). The application should make use of the following convex’s features
- computation within the UDF (should not just be a get/set/update)
- ability for multi-users to concurrently modify data in a way that shows consistency
- (bonus) try out its schema evolution functionality.
- Your team will be based on this week’s roles.
- By Wednesday Noon EST, Add a slide to this week’s presentation file (see slack)
- your team members,
- a link to you app,
- a screenshot,
- a short description of what the app does.
- one question for Sujay related to Convex and the week’s (or any previous week’s) paper topics
Further Readings
Data Quality
Required
Further Reading:
Data Markets
Required
Further Reading
Unscheduled Topics
System R Overview
Readings
Questions to consider
- System R was an impressive research and engineering effort, and the reading is a retrospective of the 6 year project.
- The paper discusses “the Convoy Problem”. Discuss the problem: What is it? Why does it exist?
- The paper discusses many many topics. Identify and pick one aspect (different than the convoy problem) that you are particularly impressed with. Discuss what and why.
INGRES/POSTGRES
Readings
Questions to consider
- What were the main goals for the Postgres system and why do you think they chose those goals? Do they make sense?
- Pick one of the (many) ideas in the paper that most interests you. Why is it interesting? Does the proposed design hold water? Feel free to read related work.
Concurrency Control
Readings
OLTP Stores
Column Stores
Readings
Cloud-scale Analytics
DB and Query Representations
Distributed Consistency under Replication
Materialized Views
Further Reading
- Materialized Views
- Applications as Materialized Views
Streaming
Datalog and Recursion
Readings
Questions
- When you assess the reading, compare against other formal and informal data query languages we have encountered.
Lineage
Question to comment on:
- Provenance goes beyond what they mention in the paper and is everywhere. Point out a concept/functionality in the real (or digital) world that can be recast as provenance and provenance queries. Describe it.
Lineage Systems
Read one of the two required papers:
Serverless Querying
Self-tuning DBs
Approximate Query Processing
Windows and Streaming
Fast Scans
Data Cubes
Oblivious Databases
Adaptive Query Processing
Readings
Further Reading
Explanation
Readings
Further Reading