miser

oMiser Technical Note
HashFlag Technique
Ob Comparison

orcmid.github.io>

miser> oMiser>
2023> 09>

m230900c>
0.3.0 2026-03-07T02:06Z

Hashing is conventionally used for look-up tables and a way to narrow searches by using the hash value as a trial, with secondary techniques employed when two different table items have the same hash value (a collison) [Knuth1998b].

Hashing for expediting look-up is also relevant for oMiser. And having HashTags carried in ob-cells allows one to reverse-lookup how the represented ob might be tied to an item in the MOb domain look-up table.

1. HashFlag Assistance to Comparison
2. The Abstract Comparison of Obs
3. The Algorithmic Comparison of Obs
4. The Benefit of HashFlags for Expedited Comparison
5. The Quest to Eliminate Colliding Duplicates
6. Resources

1. HashFlag Assistance to Comparison

The reason that all ob-cells carry a HashFlag is so that it is easier to determine whether two different ob-cells are for identical obs or not. This serves multiple purposes:

If the different ob-cells have different HashFlags, they are immediately determined to be for different obs.
If HashFlags are the same, then different ob-cells might be for identical obs or else just be a HashFlag collision between two different ob-cells.
In support of this, the ob-cells representing composed obs have Hashes derived from those of the constituents.

The conjecture is that HashFlags will at least expedite comparisons between obs via their ob-cells and the improvement is worthwhile.

If two distinct ob-cells are found to represent the same ob, there is potential for removing duplicates and saving some storage along with avoiding (prospective) subsequent HashFlag collisions.

2. The Abstract Comparison of Obs

In [obtheory: Ob6. Structural Identity], the distinction/identity of two obs are in accordance with four conditions.

        u = ob.c(v,w) ∧ z = ob.c(x,y)
                   ⇒ (u = z ⇔ v = x ∧ w = y)                  (a)

        u = ob.e(v) ∧ z = ob.e(x)
                   ⇒ (u = z ⇔ v = x)                          (b)

        ob.is-pair(u) ∧ ob.is-singleton(z)
                   ⇒ u ≠ z                                     (c)

        ob.is-individual(u) ∧ ob.a(z) ≠ z
                   ⇒ u ≠ z                                     (d)

This is sufficient to establish all of the cases, where individuals are never duplicated and are distinguished by their names (and sometimes by their construction in the case of synthetics introduced as oMiser extensions).

3. The Algorithmic Comparison of Obs

The following Frugalese pseudocode constitutes a comparision algorithm for two ob-cells supplied as operands.

def q(x, y) = if &x = &y                                      // (1)
              then true
              else if is-individual(x) or is-individual(y)
                   then false
                   else if is-enclosure(x) and is-enclosure(y)
                        then q(a(x), a(y))
                        else if is-pair(x) and is-pair(y)
                             then q(a(x), a(y)) and q(b(x), b(y))
                             else false;

Here &x is the reference to the ob-cell that was supplied for parameter x, a pseudocode device for now.

Procedure q(x, y) relies on the fact that ob-cells for specific individuals are never duplicated. Each distinct individual is manifest by a single, always-unique ob-cell.

4. The Benefit of HashFlags for Expedited Comparison

A valuable feature of HashFlags is that the Flag distinguishes the flavor of the ob represented by the ob-cell. If the HashFlags are different, the represented obs must be different. If the HashFlags agree, the represented obs are of the same structure, whether or not the Hash is a collision.

This leads to useful streamlining:

def q(x, y) = if &x = &y                                      // (2)
              then true
              else if hashflag(x) ≠ hashflag(y)
                   then false
                   else // they have identical Flags as well as Hashes
                        if is-individual(x)
                        then false // same individual only if same loc
                        else if is-enclosure(x)
                             then q(a(x), a(y))
                             else q(a(x), a(y)) and q(b(x), b(y));

Usage of hash codes to quickly differentiate list structures, as in LISP, was pointed out by Paul McJones.

Note that a collision at one level does not necessitate collisions at deeper levels of ob-cells where differences in HashFlags can still provide an expedited conclusion. But not if the obs are indeed identical and don’t coincide except at the individuals.

5. The Quest to Eliminate Colliding Duplicates

When q(x, y) determines that two distinct ob-cells manifest the same obs, it might be valuable to coalesce into one ob-cell in some manner. That would have a recurrence of the comparison find that x and y have the same location, provide a direct result, and possibly release some storage.

It is speculative whether recurrence of a specific comparison is likely. What follows applies when there is some indication that the additional effort involved is worthwhile. Construction of pathological test cases along with randomized ones may prove fascinating.

5.1 Going Under the Covers

Looking into the memory structures and connections among ob-cells, the considerations are at the computer-implementation level and not at the level of abstraction that Frugalese and oFrugal inhabit. Simplifying ob-cell structure just enough for treating comparison, here is a C/C++ counterpart of (2).

typedef struct ob {ulong refs; ulong hashflag;               // (3)
                   struct ob *a; struct ob *b;} ob;
        /* just something all ob-cells have in common, ignoring
           COM niceties which don't matter internal to oMiser here */

bool q(ob *x, ob *y)
  {  if (x == y) return true;  // same address, same ob
     if (x->hashflag != y->hashflag) return false;
     if (x->a == x) return false;  // same individual only if same address
     if (x->b == x) return q(x->a, y->a); // so singleton, and if not
     return q(x->a, y->a) && q(x->b, y->b); // must be two pairs
    }

/* Here, (x->a == x) is C/C++ for is-individual(x) and
 *       (x->b == x) is C/C++ for is-singleton(x)
 *
 * In the actual implementation, the Flag byte provides this information.
 * There is also presumption that comparing pointers for == and != is
 * reliable in the world of x64 processors.
 */

5.2 Knowing Where From

To eliminate a duplicate, it must be known where the references for x and y were obtained so those references can be made to refer to the same ob-cell, discarding a duplicate on a match In (3) we have pointers x and y to the ob-cells as parameter values, but there is no knowledge of where those pointers were obtained.

Occassionaly it is known where pointere came from: going into recursive steps q(a(x), a(y)) and q(b(x), b(y)) in (2) and its counterpart (3) above.

Replace those with qa(x, y) and qb(x, y) so it becomes known what ob-cells are going to be checked. Now there are to ob-cells that can be adjusted when their corresponding ob-cell parts are found to represent the same ob.

A C/C++ version in the spirit of (3),

bool q(ob *x, ob *y); // forward declaration                 // (4)

bool qa(ob **px, ob **py)
  { if ( !q(a(*px), a(*py)) ) return false;
    px->a = py->a = preferable(*px, *py);
    return true;
    }

bool qb(ob **px, ob **py)
  { if ( !q(b(*px), b(*py)) ) return false;
    px->b = py->b = preferable(*px, *py);
    return true;
    }

bool q(ob *x, ob *y)
  {  if (x == y) return true;
     if (x->hashflag != y->hashflag) return false;
     if (x->a == x) return false;
     if (x->b == x) return qa(x, y);
     return qa(x, y) && qb(x, y);
    }

The function ob *preferable(ob *x, ob *y) returns a pointer to the ob that is somehow preferable to retain.

Reference count adjustments are not reflected. On exit, one goes up, one goes down. That can be handled in preferable(x, y).

5.3 Considerations

Because we are recursing through q(x, y), coalescing could happen on any matches down in the recursive descent even though, overall, the compared full ob structures are not identical. It takes some experimentation to see what’s worthwhile here.
There is a presumed method of determining which of two equal obs should be substituted for the other. The heuristic idea of preference is to employ the one appearing to hang around the longest (based, say, on reference-count comparisons).
The recursive descent through the two obs is difficult for pairs. It may be crucial in practice to find a tail-recursive construction for these cases. There might be a binary tree traversal in support of that [Knuth1997: 2.3.1]. It will take more careful investigation of the trade-off.

5.4 Can Greed Be Good?

At an initial entry to q(x, y), there is nothing helpful at that presumably top-level case without knowing where x and y came from. This could be handled similary to the introduction of qa and qb, with a qp, say.

It is unclear that this is worthwhile making such additional provisions until cases that arise with oFrugal provide more perspective.

6. Resources

[Knuth1993b] Knuth, Donald E. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley (Boston: 1963). ISBN 978-0-321-60632-7 pbk.

[Knuth1997] Knuth, Donald E. Section 2.3.1, Traversing Binary Trees, in The Art of Computer Programming, vol.1: Fundamental Algorithms, ed.3. Addison-Wesley (Reading, MA: 1968, 1973, 1997). ISBN 0-201-89683-4

[Knuth1998] Knuth, Donald E. Chapter 3, Random Numbers, in The Art of Computer Programming, vol.2: Seminumerical Algorithms, ed.3. Addison-Wesley (Reading, MA: 1998). ISBN 0-201-89684-2

[Knuth1998b] Knuth, Donald E. Section 6.4, Hashing, in The Art of Computer Programing, vol. 3: Sorting and Searching, ed.2. Addison-Wesley (Reading, MA), 1998. ISBN 0-201-89685-0.

I invite discussion about Miser Project topics in the Discussion section. Improvements and removal of defects in this particular documentation can be reported and addressed in the Issues section. There are also relevant projects from time to time.

You are navigating the Miser Project on Github

created 2026-02-12 by orcmid

This site is open source. Improve this page.