COLUMBIA UNIVERSITY COMS 6113

Streaming and Incremental Processing

Student questions

Background

A streaming perspective on query execution

Can we think of it incrementally?

Pipelines

Consider recursive queries

	WITH RECURSIVE paths AS (
	  -- initialization
	  SELECT dst, w as totalw
	  FROM edges
	  WHERE src = 'A'

	  UNION 

	  SELECT edges.dst, 
			 paths.totalw + edges.w as totalw
	  FROM paths, edges
	  WHERE paths.dst = edges.src
	)
	SELECT *
	FROM paths

Consider Spark queries

OK Naiad

What is Naiad good for, that prior systems could not/had trouble doing?

Contrast with Spark Streaming (mini-batches)

Main ideas

Low-level API

Timestamps

Example

    in --> A --> Ingress ---> B -----> C ---> Egress --> D --> Out
                          ^              |
                          |___feedback___|

What if we inject some records?

    (a, 2), [1]
    (b, 6), [1]
    step()
    (a, 5), [2]

When does E.onNotify(1) get called?

Can we implement SQL operators?

Other qusetions