miser

oMiser Technical Note
HashFlag Technique
Ob Comparison

orcmid.github.io>

miser> oMiser>
2023> 09>

m230900c>

0.2.1 2026-02-20T04:22Z

Hashing is conventionally used for look-up tables and a way to narrow searches by using the hash value as a trial, with secondary techniques employed when two different table items have the same hash value (a collison) [Knuth1998b].

Hashing for expediting look-up is also relevant for oMiser. And having HashTags carried in ob-cells allows one to reverse-lookup how the represented ob might be tied to an item in the MOb domain look-up table..

1. HashFlag Assistance to Comparison

The reason that all ob-cells carry a HashFlag is so that it is easier to determine whether two different ob-cells are for identical obs or not. This serves multiple purposes:

The conjecture is that HashFlags will at least expedite comparisons between obs via their ob-cells and the improvement is worthwhile.

If two distinct ob-cells are found to represent the same ob, there is potential for removing duplicates and saving some storage along with avoiding (prospective) subsequent HashFlag collisions.

2. The Abstract Comparison of Obs

In [obtheory: Ob6. Structural Identity], the distinction/identity of two obs are in accordance with 4 conditions.

        u = ob.c(v,w) ∧ z = ob.c(x,y)
                   ⇒ (u = z ⇔ v = x ∧ w = y)                  (a)

        u = ob.e(v) ∧ z = ob.e(x)
                   ⇒ (u = z ⇔ v = x)                          (b)

        ob.is-pair(u) ∧ ob.is-singleton(z)
                   ⇒ u ≠ z                                     (c)

        ob.is-individual(u) ∧ ob.a(z) ≠ z
                   ⇒ u ≠ z                                     (d)

This is sufficient to establish all of the cases, where individuals are never duplicated and are distinguished by their names (and sometimes by their construction in the case of synthetics introduced as oMiser extensions).

3. The Algorithmic Comparison of Obs

The following Frugalese pseudocode constitutes a comparision algorithm for two ob-cells supplied as operands.

def q(x, y) = if &x = &y                                      // (1)
              then true
              else if is-individual(x) or is-individual(y)
                   then false
                   else if is-enclosure(x) and is-enclosure(y)
                        then q(a(x), a(y))
                        else if is-pair(x) and is-pair(y)
                             then q(a(x), a(y)) and q(b(x), b(y))
                             else false;

Here &x is the reference to the ob-cell that was supplied for parameter x, a pseudocode device for now.

Procedure q(x, y) relies on the fact that ob-cells for specific individuals are never duplicated. Each distinct individual is manifest by a single, always-unique ob-cell.

4. The Benefit of HashFlags for Expedited Comparison

A valuable feature of HashFlags is that the Flag distinguishes the flavor of the ob represented by the ob-cell. If the HashFlags are different, the represented obs must be different. If the HashFlags agree, the represented obs are of the same structure, whether or not the Hash is a collision.

This leads to useful streamlining:

def q(x, y) = if &x = &y                                      // (2)
              then true
              else if hashflag(x) ≠ hashflag(y)
                   then false
                   else // they have identical Flags as well as Hashes
                        if is-individual(x)
                        then false // same individual only if same loc
                        else if is-enclosure(x)
                             then q(a(x), a(y))
                             else q(a(x), a(y)) and q(b(x), b(y));

Usage of hash codes to quickly differentiate list structures, as in LISP, was pointed out by Paul McJones.

Note that a collision at one level does not necessitate collisions at deeper levels of ob-cells where differences in HashFlags can still provide an expedited conclusion. But not if the obs are indeed identical and don’t coincide except at the individuals.

5. The Quest to Eliminate Colliding Duplicates

When q(x, y) determines that two distinct ob-cells manifest the same obs, it might be valuable to coalesce into one ob-cell in some manner. That would have a recurrence of the comparison find that x and y have the same location, provide a direct result, and possibly release some storage.

It is speculative whether recurrence of a specific comparison is likely. What follows applies when there is some indication that the additional effort involved is worthwhile. Construction of pathological test cases along with randomized ones may prove fascinating.

5.1 Going Under the Covers

Looking into the memory structures and connections among ob-cells, the considerations are at the computer-implementation level and not at the level of abstraction that Frugalese and oFrugal inhabit. Simplifying ob-cell structure just enough for treating comparison, here is a C/C++ counterpart of (2).

typedef struct ob {ulong refs; ulong hashflag;               // (3)
                   struct ob *a; struct ob *b;} ob;
        /* just something all ob-cells have in common, ignoring
           COM niceties which don't matter internal to oMiser here */

bool q(ob *x, ob *y)
  {  if (x == y) return true;  // same address, same ob
     if (x->hashflag != y->hashflag) return false;
     if (x->a == x) return false;  // same individual only if same address
     if (x->b == x) return q(x->a, y->a); // so singleton, and if not
     return q(x->a, y->a) && q(x->b, y->b); // must be two pairs
    }

/* Here, (x->a == x) is C/C++ for is-individual(x) and
 *       (x->b == x) is C/C++ for is-singleton(x)
 *
 * In the actual implementation, the Flag byte provides this information.
 * There is also presumption that comparing pointers for == and != is
 * reliable in the world of x64 processors.
 */

5.2 Knowing Where From

To eliminate a duplicate, it must be known where the references for x and y were obtained so those references can be made to refer to the same ob-cell, discarding a duplicate on a match In (3) we have pointers x and y to the ob-cells as parameter values, but there is no knowledge of where those pointers were obtained.

Occassionaly it is known where pointere came from: going into recursive steps q(a(x), a(y)) and q(b(x), b(y)) in (2) and its counterpart (3) above.

Replace those with qa(x, y) and qb(x, y) so it becomes known what ob-cells are going to be checked. Now there are to ob-cells that can be adjusted when their corresponding ob-cell parts are found to represent the same ob.

A C/C++ version in the spirit of (3),

bool q(ob *x, ob *y); // forward declaration                 // (4)

bool qa(ob **px, ob **py)
  { if ( !q(a(*px), a(*py)) ) return false;
    px->a = py->a = preferable(*px, *py);
    return true;
    }

bool qb(ob **px, ob **py)
  { if ( !q(b(*px), b(*py)) ) return false;
    px->b = py->b = preferable(*px, *py);
    return true;
    }

bool q(ob *x, ob *y)
  {  if (x == y) return true;
     if (x->hashflag != y->hashflag) return false;
     if (x->a == x) return false;
     if (x->b == x) return qa(x, y);
     return qa(x, y) && qb(x, y);
    }

The function ob *preferable(ob *x, ob *y) returns a pointer to the ob that is somehow preferable to retain.

Reference count adjustments are not reflected. On exit, one goes up, one goes down. That can be handled in preferable(x, y).

5.3 Considerations

5.4 Can Greed Be Good?

At an initial entry to q(x, y), there is nothing helpful at that presumably top-level case without knowing where x and y came from. This could be handled similary to the introduction of qa and qb, with a qp, say.

It is unclear that this is worthwhile making such additional provisions until cases that arise with oFrugal provide more perspective.

6. Resources

[Knuth1993b] Knuth, Donald E. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley (Boston: 1963). ISBN 978-0-321-60632-7 pbk.

[Knuth1998] Knuth, Donald E. Chapter 3, Random Numbers, in The Art of Computer Programming, vol.2: Seminumerical Algorithms, ed.3. Addison-Wesley (Reading, MA: 1998). ISBN 0-201-89684-2

[Knuth1998b] Knuth, Donald E. Section 6.4, Hashing, in The Art of Computer Programing, vol. 3: Sorting and Searching, ed.2. Addison-Wesley (Reading, MA), 1998. ISBN 0-201-89685-0.


I invite discussion about Miser Project topics in the Discussion section. Improvements and removal of defects in this particular documentation can be reported and addressed in the Issues section. There are also relevant projects from time to time.

Hard Hat Area You are navigating the Miser Project on Github

created 2026-02-12 by orcmid