COLUMBIA UNIVERSITY COMS 6113

Datalog Primer

Partial containment of languages

Syntax

    A :- B1,... Bn

3 Semantics (first order logic FOL) for datalog programs

Negation (!datalog)

Semipositive datalog

Stratified Negation

Aggregation

    reachable(X,Y) :- link(X,Y).
    reachable(X,Y) :- reachable(X,Z), link(Z,Y).
    summary(X, count<Y>) :- reachable(X,Y).

Aggregate Selection

Functors

Query Execution

Topdown vs bottom up

Naive/SemiNaive

Naive approach straightup recomputes everything all the time

SemiNaive leverages fact that P is monotone to only propagate changes

Top Down Evaluation

Main idea

Example

r1: R(X, Y) = L(X, Y)
r2: R(X, Y) = L(X, Z), R(Z, Y)
    Q(X, Y) = R(X, Y)
    Q(1, Y)

Terminology

Example

R(X, Y) = L(X, Z), R(Z, Y)
R(1, Y)

top down info pass:

R(1, Y) = L(1, Z), R(Z, Y)

eval L since it's a base relation

L(1, 2), L(1, 3) matches

R(1, Y) = L(1, 2), R(2, Y)
        = L(1, 3), R(3, Y)

R(2, Y) and R(3, Y) are candidate rules now 
since they are heads and distinguished

Magic Sets

Main idea

Adorned Exampl

Start with adorned program

ar1  R[bf](X, Y) = L[bf](X, Y)
ar2  R[bf](X, Y) = L[bf](X, Z), R[bf](Z, Y)

Create inital binding predicates

     in_R[bf](1)  
      sup^1_0(X) = in_R[bf](X)
      sup^1_1(Y) = sup^1_0(X), L[bf](X, Y)

      sup^2_0(X) = in_R[bf](X)
      sup^2_1(Z) = sup^2_0(X), L[bf](X, Z)
      sup^2_2(Y) = sup^2_1(Z), R[bf](Z, Y)

Recursion implies that supplementary relation for R also feeds input bindings

     in_R[bf](X) = sup^2_1(X)

Finally, get results

      R[bf](X, Y) = sup^1_2(Y)
                  = sup^2_2(Y)

Semi-naive evaluation naturally filters out irrelevant facts!

Concerns: managing and applying constraints is not free!

Filter-Join for R join S

More Reading

Extensions