







## A Circuit-Based Approach to Efficient Enumeration

Antoine Amarilli<sup>1</sup>, Pierre Bourhis<sup>2</sup>, Louis Jachiet<sup>3</sup>, Stefan Mengel<sup>4</sup>

May 9th, 2017

¹Télécom ParisTech

<sup>2</sup>CNRS CRIStAL

<sup>3</sup>Université Grenoble-Alpes

<sup>4</sup>CNRS CRIL

## Problem statement



Input







• Problem: The output may be too large to compute efficiently



• Problem: The output may be too large to compute efficiently

Q paris big data



• Problem: The output may be too large to compute efficiently



Results 1 - 20 of 10,514



• Problem: The output may be too large to compute efficiently



Results 1 - 20 of 10,514

. . .



Problem: The output may be too large to compute efficiently



Results 1 - 20 of 10,514

. . .

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)



• Problem: The output may be too large to compute efficiently



Results 1 - 20 of 10,514

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)

→ Solution: Enumerate solutions one after the other



Input









Results











#### **Currently:**



#### **Currently:**







#### **Currently:**







#### Our idea:







# **Currently:** Enumeration Input Results Input Enumeration Results Enumeration Input Results







• Directed acyclic graph of gates



- Directed acyclic graph of gates
- Output gate:





- Directed acyclic graph of gates
- Output gate:



• Variable gates:





- Directed acyclic graph of gates
- Output gate:



• Variable gates:



• Internal gates:









- Directed acyclic graph of gates
- Output gate:



• Variable gates:

• Internal gates:











- Directed acyclic graph of gates
- Output gate:



• Variable gates:







• Valuation: function from variables to {0,1} Example:  $\nu = \{x \mapsto 0, y \mapsto 1\}...$ 



- Directed acyclic graph of gates
- Output gate:



• Variable gates:







• Valuation: function from variables to {0,1} Example:  $\nu = \{x \mapsto 0, y \mapsto 1\}...$ 



- Directed acyclic graph of gates
- Output gate:



• Variable gates:

• Internal gates:









• Valuation: function from variables to {0,1} Example:  $\nu = \{x \mapsto 0, y \mapsto 1\}$ ... mapped to 1

#### **Boolean circuits**



- Directed acyclic graph of gates
- Output gate:



• Variable gates:

• Internal gates:







- Valuation: function from variables to  $\{0,1\}$ Example:  $\nu = \{x \mapsto 0, y \mapsto 1\}$ ... mapped to 1
- Assignment: set of variables mapped to 1 Example:  $S_{\nu} = \{y\}$ ; more concise than  $\nu$

#### **Boolean circuits**



- Directed acyclic graph of gates
- Output gate:



• Variable gates:







- Valuation: function from variables to  $\{0,1\}$ Example:  $\nu = \{x \mapsto 0, y \mapsto 1\}$ ... mapped to 1
- Assignment: set of variables mapped to 1 Example:  $S_{\nu} = \{y\}$ ; more concise than  $\nu$

Our task: Enumerate all satisfying assignments of an input circuit

#### **Circuit restrictions**

#### d-DNNF:

• (V) are all deterministic:

The inputs are mutually exclusive (= no valuation  $\nu$  makes two inputs simultaneously evaluate to 1)



#### **Circuit restrictions**

#### d-DNNF:

• (V) are all deterministic:

The inputs are mutually exclusive (= no valuation  $\nu$  makes two inputs simultaneously evaluate to 1)

The inputs are **independent** (= no variable *x* has a path to two different inputs)



#### **Circuit restrictions**

#### d-DNNF:

• (V) are all **deterministic**:

The inputs are mutually exclusive (= no valuation  $\nu$  makes two inputs simultaneously evaluate to 1)

The inputs are **independent** (= no variable *x* has a path to two different inputs)

**v-tree:** ∧-gates follow a **tree** on the variables



#### **Main results**

#### **Theorem**

Given a **d-DNNF circuit C** with a **v-tree T**, we can enumerate its **satisfying assignments** with preprocessing **linear in** |C| + |T| and delay **linear in each assignment** 

#### **Main results**

#### **Theorem**

Given a **d-DNNF circuit C** with a **v-tree T**, we can enumerate its **satisfying assignments** with preprocessing **linear in** |C| + |T| and delay **linear in each assignment** 

Also: restrict to assignments of **constant size**  $k \in \mathbb{N}$  (at most k variables are set to 1):

#### **Theorem**

Given a **d-DNNF circuit C** with a **v-tree T**, we can enumerate its **satisfying assignments** of size  $\leq k$  with preprocessing **linear in** |C| + |T| and **constant delay** 







• Factorized databases: implicit representation of database tables



· Relational product



Relational product

• Factorized databases: implicit representation of database tables



• Deterministic: We do not obtain the same tuple multiple times

• Relational union

• Factorized databases: implicit representation of database tables



- Relational product
- $(\times)$

• Relational union



• Deterministic: We do not obtain the same tuple multiple times

# Theorem (Strenghtens result of [Olteanu and Závodnỳ, 2015])

Given a deterministic factorized representation, we can enumerate its tuples with linear preprocessing and constant delay

• Compute the results (a, b, c) of a query Q(x, y, z) on a database D

- Compute the results (a, b, c) of a query Q(x, y, z) on a database D
- Assumption: the database has bounded treewidth
  - → Captures trees, words, etc.

- Compute the results (a, b, c) of a query Q(x, y, z) on a database D
- Assumption: the database has bounded treewidth
  - → Captures trees, words, etc.
- Query given as a deterministic tree automaton
  - → Captures monadic second-order (data-independent translation)
  - → Captures conjunctive queries, SQL, etc.

- Compute the results (a, b, c) of a query Q(x, y, z) on a database D
- Assumption: the database has bounded treewidth
  - → Captures trees, words, etc.
- Query given as a deterministic tree automaton
  - → Captures monadic second-order (data-independent translation)
  - → Captures conjunctive queries, SQL, etc.
- → We can construct a **d-DNNF** that describes the query results

- Compute the results (a, b, c) of a query Q(x, y, z) on a database D
- Assumption: the database has bounded treewidth
  - → Captures trees, words, etc.
- Query given as a deterministic tree automaton
  - → Captures monadic second-order (data-independent translation)
  - → Captures conjunctive queries, SQL, etc.
- $\rightarrow$  We can construct a **d-DNNF** that describes the query results

**Theorem (Recaptures [Bagan, 2006], [Kazana and Segoufin, 2013])** Given a MSO query Q and a database D, the results of Q on D can be enumerated with  $linear\ preprocessing\ in\ D$  and  $linear\ delay$  in each answer ( $\rightarrow$  constant delay for free first-order variables)

# Proof techniques

# **Preprocessing phase:**



Circuit



v-tree

# **Preprocessing phase:**



# **Preprocessing phase:**



## **Preprocessing phase:**



#### **Enumeration phase:**



Normalized

circuit

## **Preprocessing phase:**



## **Enumeration phase:**







- No NOT-gate
- Each gate captures a set of assignments
- Bottom-up definition with  $\times$  and  $\cup$



- $\{\{y\},\{z\}\}$  No NOT-gate
  - Each gate captures a set of assignments
  - Bottom-up definition with  $\times$  and  $\cup$



- $\{\{y\},\{z\}\}$  No NOT-gate
  - Each gate captures a set of assignments
  - Bottom-up definition with  $\times$  and  $\cup$



• d-DNNF: ∪ are disjoint, × are on disjoint sets



• d-DNNF: ∪ are disjoint, × are on disjoint sets

#### Many equivalent ways to understand this:

- Generalization of factorized representations
- Analogue of zero-suppressed OBDDs (implicit negation)
- Arithmetic circuits: × and + on polynomials



Special **zero-suppressed semantics** for circuits:

- No **NOT**-gate
- Each gate captures a set of assignments
- Bottom-up definition with  $\times$  and  $\cup$
- d-DNNF: ∪ are disjoint, × are on disjoint sets

#### Many equivalent ways to understand this:

- Generalization of factorized representations
- Analogue of zero-suppressed OBDDs (implicit negation)
- Arithmetic circuits: × and + on polynomials

**Simplification:** rewrite circuits to arity-two (fan-in  $\leq$  2)

Task: Enumerate the elements of the set S(g) captured by a gate g

 $\rightarrow$  E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 

Task: Enumerate the elements of the set S(g) captured by a gate g

 $\rightarrow$  E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 

Base case: variable (x):

Task: Enumerate the elements of the set S(q) captured by a gate q

 $\rightarrow$  E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}\$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 



Base case: variable (x): enumerate  $\{x\}$  and stop

Task: Enumerate the elements of the set S(q) captured by a gate q

$$\rightarrow$$
 E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 





Concatenation: enumerate S(q)and then enumerate S(q')

## **Enumerating assignments in the zero-suppressed semantics**

Task: Enumerate the elements of the set S(g) captured by a gate g

$$ightarrow$$
 E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 

Base case: variable (x): enumerate (x) and stop



Concatenation: enumerate S(g) and then enumerate S(g')

Determinism: no duplicates

## **Enumerating assignments in the zero-suppressed semantics**

Task: Enumerate the elements of the set S(g) captured by a gate g

$$\rightarrow$$
 E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 

Base case: variable



 $\begin{pmatrix} x \end{pmatrix}$  : enumerate  $\{x\}$  and stop





Concatenation: enumerate S(g) and then enumerate S(g')

Determinism: no duplicates

Lexicographic product: enumerate S(g) and for each result t enumerate S(g') and concatenate t with each result

## **Enumerating assignments in the zero-suppressed semantics**

Task: Enumerate the elements of the set S(g) captured by a gate g

$$\rightarrow$$
 E.g., for  $S(g) = \{\{x,y\}, \{x,z\}\}$ , enumerate  $\{x,y\}$  and then  $\{x,z\}$ 

Base case: variable (x): enumerate  $\{x\}$  and stop



AND-gate

Concatenation: enumerate S(g) and then enumerate S(g')

Determinism: no duplicates

Lexicographic product: enumerate S(g) and for each result t enumerate S(g') and concatenate t with each result

Decomposability: no duplicates

# Conclusion

#### Summary:

• Usual approach: develop enumeration algorithms by hand

- Usual approach: develop enumeration algorithms by hand
- Proposed approach:

- Usual approach: develop enumeration algorithms by hand
- Proposed approach:
  - Develop linear-time compilation algorithm to circuits

- Usual approach: develop enumeration algorithms by hand
- Proposed approach:
  - Develop linear-time compilation algorithm to circuits
  - Use restricted circuit classes (structured d-DNNF)

- Usual approach: develop enumeration algorithms by hand
- Proposed approach:
  - Develop linear-time compilation algorithm to circuits
  - Use restricted circuit classes (structured d-DNNF)
  - Develop general enumeration results on circuits

#### Summary:

- Usual approach: develop enumeration algorithms by hand
- Proposed approach:
  - Develop linear-time compilation algorithm to circuits
  - Use restricted circuit classes (structured d-DNNF)
  - · Develop general enumeration results on circuits

#### Future work:

- Theory: handle updates on the structure
- Practice: implement the technique with automata

#### Summary:

- Usual approach: develop enumeration algorithms by hand
- Proposed approach:
  - Develop linear-time compilation algorithm to circuits
  - Use restricted circuit classes (structured d-DNNF)
  - · Develop general enumeration results on circuits

#### Future work:

- Theory: handle updates on the structure
- Practice: implement the technique with automata

Thanks for your attention!











• **Problem:** if  $S(g) = \emptyset$  we waste time



- **Problem:** if  $S(g) = \emptyset$  we waste time
- Solution: compute bottom-up if  $S(g) = \emptyset$













• **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:
  - split g between  $S(g) \cap \{\{\}\}$  and  $S(g) \setminus \{\{\}\}$  (homogenization)



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:
  - split g between  $S(g) \cap \{\{\}\}$ and  $S(g) \setminus \{\{\}\}$  (homogenization)
  - remove inputs with  $S(g) = \{\{\}\}$  for AND-gates



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:
  - split g between  $S(g) \cap \{\{\}\}$  and  $S(g) \setminus \{\{\}\}$  (homogenization)
  - remove inputs with  $S(g) = \{\{\}\}\$  for AND-gates



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:
  - split g between  $S(g) \cap \{\{\}\}$ and  $S(g) \setminus \{\{\}\}$  (homogenization)
  - remove inputs with  $S(g) = \{\{\}\}\$  for AND-gates
  - · collapse AND-chains with fan-in 1



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:
  - split g between  $S(g) \cap \{\{\}\}$ and  $S(g) \setminus \{\{\}\}$  (homogenization)
  - remove inputs with  $S(g) = \{\{\}\}\$  for AND-gates
  - · collapse AND-chains with fan-in 1



- **Problem:** if *S*(*g*) contains {} we waste time in chains of AND-gates
- Solution:
  - split g between  $S(g) \cap \{\{\}\}$ and  $S(g) \setminus \{\{\}\}$  (homogenization)
  - remove inputs with  $S(g) = \{\{\}\}\$  for AND-gates
  - · collapse AND-chains with fan-in 1
- → Now, traversing an AND-gate ensures that we make progress: it splits the assignments non-trivially





- Problem: we waste time in OR-hierarchies to find a reachable exit (non-OR gate)
- Solution: compute reachability index



- Problem: we waste time in OR-hierarchies to find a reachable exit (non-OR gate)
- Solution: compute reachability index



- Problem: we waste time in OR-hierarchies to find a reachable exit (non-OR gate)
- Solution: compute reachability index
- Problem: must be done in linear time



- Problem: we waste time in OR-hierarchies to find a reachable exit (non-OR gate)
- Solution: compute reachability index
- Problem: must be done in linear time

#### Solution:

- Determinism ensures we have a multitree (we cannot have the pattern at the right)
- Custom constant-delay reachability index for multitrees



• This is where we use the v-tree



- This is where we use the v-tree
- Add explicitly untested variables



- This is where we use the v-tree
- Add explicitly untested variables



- This is where we use the v-tree
- Add explicitly untested variables





• Problem: quadratic blowup

- This is where we use the v-tree
- Add explicitly untested variables





- Problem: quadratic blowup
- Solution:
  - Order < on variables in the v-tree (x < y < z)</li>
  - Interval [x, z]
  - Range gates to denote  $\bigvee [x,z]$  in constant space

#### References

Bagan, G. (2006).

MSO queries on tree decomposable structures are computable with linear delay.

In CSL.

🚺 Kazana, W. and Segoufin, L. (2013).

Enumeration of monadic second-order queries on trees.

TOCL, 14(4).

Olteanu, D. and Závodnỳ, J. (2015).

Size bounds for factorised representations of query results.

TODS, 40(1).