miser

oMiser Technical Note m230900
HashFlags Technique

orcmid.github.io>

miser> oMiser>

m230900>
0.1.7 2025-11-07T02:11Z

In the tentative x64 ob-cell storage structure for every Ob, there is a 64-bit (8-byte) word containing at most a 56-bit (7-byte) Hash value and an 8-bit (1-byte) Flag field. The combination is designated as the HashFlags field of the ob-cell.

When two addresses of ob-cells are the same, the represented obs are necessarily the same.
When two ob-cells are in different locations and their HashFlags are different, the ob-cell-represented obs are different.
When two ob-cells are in different locations but their HashFlags are the same, they must be examined further to determine if they are different despite having the same HashFlags.

There are a number of useful invariants:

The Flag field is an invariant of the structural properties of an ob. When HashFlags are the same, the compared ob-cells have identical Flag fields.
Individuals are never duplicated. Two references to ob-cells that are known to be for individuals will be to different individuals if the ob-cell locations are different, despite any collision of Hash Values.
Hash values are not based on locations or other characteristics of ob-cells apart from the structural characteristics of the ob and the determination of hash values for individuals from known information (such as a lindy’s name).

There are also questions to be resolved, requiring both analysis and experimentation, perhaps by instrumentation of the oMiser implementation.

What is an usable but simple hashing algorithm for the Hash of an ob pair from the HashFlags of the components?
Is a 24-bit (3-byte) or 32-bit (4-byte) Hash field good enough? Having at least 24 bits unassigned provides space for reference counting as an alternative to garbage collection.
Is a 24-bit (3-byte) field adequate for an optional reference count? (A collateral question as part of keeping x64 ob-cells to 4 64-bit words)
When recursive descent is required to confirm or refute the matching of HashFlags, the process repeats so long as HashFlag matching continues. At each level where matching of different ob-cells is confirmed, there may be opportunity to replace the reference to one with a reference to the other. The advantage is potential release of some ob-cells and improvement of future comparisons by having more locations match. The question is whether this is worth the additional processing on those occasions, and whether even checking for it is worthwhile in what we want to be a very rapid process.
The derivation of a Hash for every introduction of a new ob-cell adds to the computational cost of those actions. When performed as part of oFrugal semantics, this will be negligible. When performed in the course of an application, especially for short-lived ob-cells that will never be exposed through the oFrugal API, it must be determined how much this activity (along with memory management) can be kept low-cost.
There is a related consideration with regard to short-lived ob-cells, ones that are transient intermediate results in applicative operations. For some time I have wanted to leave initial creations of ob-cells on the run-time stack, simply dropping them by a retreat of the stack if not swept into a longer-lived storage generation. In that case, perhaps even computation of hashes (and actions on reference counts?) might be deferred until a sweep. This is all speculative. I’d heard that Henry Baker may have done something similar in Chicken Scheme (https://www.call-cc.org/) and his clever C Language approach with continuations has been described at https://dl.acm.org/doi/10.1145/214448.214454. Henry and I talked about this notion back at the first AAAI conference when I was thinking of CP/M and Z-80 implementatins where hardware stacks were usable. I was already thinking of Miser but had not formulated it yet. The trick in rapid initial ob-cell constructions on the stack is avoiding dragging so many no-longer reachable intermediate ob-cells that the run- time stack overflows. With Just-In-Time (JIT) generation of machine code along with good tail- recursion handling, there might be a reasonable time before sweeps are required. There might still be pathological cases, such as when lists are reversed/mirror-imaged. And there might also be tricks to rescue those.

Folio Catalog

This placeholder links to raw materials and notes, including text files. There will be organized folios of content as this topic is explored in detail.

ID	Status	Started	Topic

m230900a	undated	2025-09-16	Notes & Work Items
m230900b	0.1.0 2025-09-26	2025-09-24	Hashing Cases

I invite discussion about Miser Project topics in the Discussion section. Improvements and removal of defects in this particular documentation can be reported and addressed in the Issues section. There are also relevant projects from time to time.

You are navigating the Miser Project on Github

created 2025-09-14 by orcmid

This site is open source. Improve this page.