1/28/10

Can't... Count...

A brief thought on semi-tractable decision tree minimization, for a moment I assume that a minimal circuit equals a minimal decision tree. It just popped into my head, I am not too confident on it, but thought it would be worthwhile to write down anyway.

Circuit minimization deals with the question: What is the minimal circuit for a boolean function f:{0,1}n->{0,1}? In general, the question is considered to be intractable since the algorithms probe the size 2n state space. I looked at some solutions, which looked like kludges to me. But I am bad at understanding things I don't understand, that seems to be true for a lot of people.

How would I go at it? Well, to distinguish between all vectors, you need to find the bit which halves the state space best, you could try that by counting the {0,1} x {0,1} relation between the value of a bit, and the result of the function, for each bit. By factoring out bits, you could derive the minimal circuit. And there you see minimal circuitry is related to the ability to count - both intractable.

Counting is prohibitive directly on the state space, but circuits distinguish between symbols of form v = v0 ∧ ¬v1 ∧ ..∧ vn, a conjunction of number |v| variables where a variable may be negated, and such a symbol encodes 2n-|v| states of the state space. That number can be decomposed and stored directly.

Let's rephrase the question to minimization of small Nand terms Φ(v), which only distinguishes between small numbers of symbols. Thus: How to minimize f:Φ(v)->{0,1}? Well, with the same procedure it could be tried. Or one could start of at the bottom and build minimized decision trees. Trees which are the combination of two other trees under a logical connective could be minimized by probing recursively the square of all endpoints of the trees. But that leads to an intractable algorithm.


A small example, first an informal description of the algorithm.

For a formula Φ depending on {v0, .., vn}:
   0. If Φ consists of one variable or a constant, return Φ.
   1. Count the correlation between each variable vi and the formula Φ 
      with sign 1.
   2. Count the correlation between each variable vi and the negation of the formula ¬Φ 
      with sign 0.
   3. Select either the formula or it's negation, whichever describes the 
      smallest space, Ψ = b ⊕ Φ.
   4. Select the variable vk which splits that state space the best in halves.
   5. Recursively determine the terms for Ψ[vk=1] and Ψ[vk=0].
   6. Return b ⊕ Ite(vk, Ψ[vk=1], Ψ[vk=0]). 
The algorithm for counting is trivial. It takes a DNF and a sign, DNFs can be obtained by sat solving the term. For each conjunction and for each variable occurring in the term, the entry with the variable and sign is increased with 2n-|v|. For each variable not occurring, it and its negation are increased with 2n-|v|-1 given the sign.

Note that decision trees encode DNFs, and the the combination of a DNF under most boolean connectives has at most squared complexity.

Lets try it on a(ba). This decomposes into ¬a ∨ (a ∧ b). The algorithm starts of with empty tables. The table as holds the counts for a variable a and a sign s, the sign states whether the elements in the proposition or the negation are counted.




as
000
010
100
110

bs
000
010
100
110

After processing ¬a, two occurrences of ¬a and one occurrence of b and ¬b are counted.



as
000
012
100
110

bs
000
011
100
111

After processing (a ∧ b), one occurrence of a and b are counted.



as
000
012
100
111

bs
000
011
100
112

The algorithm proceeds with counting the negation of the formula of which the DNF is (a ∧ ¬ b).



as
000
012
101
111

bs
001
011
100
112

The negation of the state space is selected, since the state space with sign 0 totals one element, the other three. The algorithm derives ¬ite(a, ¬b, 0) which trivializes to a(bb).

Counting is linear in the DNF of Φ. The algorithm seems to split the DNF too on every recursion, so its complexity seems to be the number of variables times the DNF, which may be exponential, but for certain decision trees may be tractable. Hence the semi-tractable part.

Guess what rests is determining all the mistakes I made. It looks way too simple not to be know.

012910: It might give small trees, but not minimal circuits. Shared variables among trees are not factored out. Also, it doesn't take into account that the left-hand side of a circuit may be expressed in terms of factors of the right-hand side.


This mind never stops.