|
oMiser Technical Note m230900
|
orcmid.github.io>
miser> oMiser> m230900> 0.3.0 2026-02-20T21:52Z |
In the provisional x64 ob-cell storage structure for every ob, there is a 32-bit (4-byte) word containing a 24-bit (3-byte) Hash value and an 8-bit (1-byte) Flag field. The combination is designated as the HashFlag field of the ob-cell.
The Flag field is an invariant of the structural properties of an ob. When HashFlags are the same, the compared ob-cells have identical Flag fields.
Individuals are never duplicated. Two references to ob-cells that are known to be for individuals will be to different individuals if the ob-cell locations are different, despite any collision of Hash Values.
Hash values are not based on locations or other characteristics of ob-cells apart from the structural characteristics of the ob and the determination of hash values for individuals from known information (such as a lindy’s name) and for composed obs, the HashFlags of components.
Both analysis and experimentation is required, including by instrumentation of the oMiser implementation.
What is an usable but simple hashing algorithm for the Hash of an ob pair from the HashFlags of the components?
Are 24-bit (3-byte) Hash fields good enough? That permits 16,777,215 values (0 being excluded). Are the expected collisions tolerable?
Are HashFlags too costly? The derivation of a Hash for every introduction of a new ob-cell adds to the computational cost of those actions. When performed as part of oFrugal When performed in the course of an application, especially for short-lived ob-cells that will never be exposed through the API, it must be determined how much this activity (along with memory management) can be kept low-cost.
And is it wasteful for no short-term gain? There is a related consideration with regard to short-lived ob-cells, ones that are transient intermediate results in applicative operations. For some time I have wanted to leave initial creations of ob-cells on the run-time stack, simply dropping them by a retreat of the stack if not swept into a longer-lived storage generation. In that case, perhaps even computation of hashes (and actions on reference counts?) might be deferred until a sweep. This is all speculative. Henry Baker has done something similar in Chicken Scheme and his clever C Language approach with continuations has been described at https://dl.acm.org/doi/10.1145/214448.214454.
There are some places to look farther.
It turns this is not a new problem with many small elements, such as ob-cells, that was resolved in the case of Microsoft Excel economizing on spreadsheet cells even though COM interfaces were introduced.
Henry Baker has found economy of short-lived small objects using the runtime hardware stack. The technique is used in Chicken Scheme and his clever C Language approach is described at Baker1995 and earlier.
| ID | Status | Started | Topic |
|---|---|---|---|
| m230900a | undated | 2025-09-16 | Notes & Work Items |
| m230900b | 0.1.6 2026-02-17 | 2025-09-24 | Hashing Cases |
| m230900c | 0.2.1 2026-02-20 | 2026-02-12 | Ob Comparison |
I invite discussion about Miser Project topics in the Discussion section. Improvements and removal of defects in this particular documentation can be reported and addressed in the Issues section. There are also relevant projects from time to time.
|
|
You are navigating the Miser Project on Github |
created 2025-09-14 by orcmid |