Skip to content

Data Model

Records are stored in the LelielStore in-memory columnar store backed by DuckDB for write-through durability. There is no graph database. The factor graph is built in memory from the ingested records and updated on every ingest event.


Record

The canonical unit of data in Leliel. Every record is one node in the factor graph.

Field Type Required Description
record_id string Yes Unique identifier; primary key within the store
source_id string Yes Producer identifier; used to scope list and query results
signal_value float Yes Normalised quality signal in [0.0, declared_upper_bound]
declared_upper_bound float No Upper bound for signal_value; defaults to 1.0
timestamp string No ISO 8601 timestamp; used for recency fallback ordering
extra_data JSON object No Arbitrary metadata; field-value pairs drive the IDF semantic index

Re-ingesting an existing record_id updates the columnar metadata in-place without resetting the accumulated mass. Mass is preserved across upserts.


Factor graph

The factor graph is the in-memory graph over which the quantum walk evolves.

Nodes: one node per record. Node index i aligns exactly with the columnar metadata vectors so all field reads are O(1) index accesses with no hash lookup.

Co-occurrence edges: within each source, a sliding window of width K=5 connects each record to its K nearest neighbours by ingest order. Edges represent temporal proximity in the ingested signal stream.

Cross-source co-failure edges: when a record is ingested with a negative signal (signal_value low relative to declared_upper_bound), up to 3 edges are drawn to the nearest-mass records from other sources. These edges capture cross-source failure correlation.

Wormhole edges (structural): after each ingest, the Fiedler eigenvector of the graph Laplacian is cached. Any two high-mass nodes whose Fiedler components differ by less than WORMHOLE_ALPHA * std(v2) receive a wormhole edge. Wormhole edges are injected into the walk snapshot at query time. They represent structural proximity in the ER=EPR sense: nodes that are topologically equivalent in the Laplacian embedding may exhibit correlated behaviour even when not directly connected.


Mass field

Every node carries a scalar utility mass value. Mass accumulates from REINFORCE feedback and decays via Hawking radiation. Three sources contribute to the effective mass used by the walk Hamiltonian:

Component Source Description
Utility mass REINFORCE feedback Accumulates on negative signal; decays on positive signal and over time
Structural mass Fiedler proximity Wormhole edges encode structural co-location (ER=EPR)
Semantic mass IDF feature similarity Cold-start bias toward semantically similar records via extra_data IDF vectors

Utility mass is the primary component at query time. Structural and semantic components operate via edge injection and walk seed bias respectively rather than as additive mass terms.

Hawking decay: mass decays continuously with a half-life of 86400 seconds (one day). The decay is time-proportional and applied lazily on each write; the decay constant is derived from the one-day half-life via alpha_H = ln(2) / 86400.

Schwarzschild threshold: M_s = lambda_2(L) * log(1/epsilon) / (G * K). Derived from the Fiedler eigenvalue of the live graph; not a configurable constant. Nodes above M_s are black holes: the walk's Born-rule amplitude suppresses them without explicit exclusion logic.


DuckDB schema

DuckDB provides write-through durability. The hot query path reads exclusively from the in-memory columnar store; no SQL is issued during query execution.

records

Durable record store. One row per record.

Column Type Description
record_id text Primary key; matches the in-memory node identifier
source_id text Producer identifier
signal_value double Quality signal
declared_upper_bound double Declared amplitude upper bound
timestamp text ISO 8601 timestamp string, nullable
extra_data text JSON-serialised extra metadata, nullable
ingested_at timestamp UTC time this record was written to DuckDB

record_masses

Durable mass index. One row per record. Updated on REINFORCE writes.

Column Type Description
record_id text Foreign key to records
mass double Current utility mass value
last_updated timestamp UTC time of most recent mass write

analysis_summaries

Source-level analysis summaries computed by the background analysis worker.

Column Type Description
source_id text Source identifier
record_count integer Number of records in this source
mean_signal double Mean signal_value across records
mean_mass double Mean mass across records
black_hole_count integer Records above M_s at time of summary
computed_at timestamp UTC time this summary was computed

mesh_snapshots

Append-only SLEM trend log written by the analysis worker each cycle.

Column Type Description
slem double Second Largest Eigenvalue Modulus of the walk weight matrix
spectral_gap double 1 - slem
node_count integer Graph size at snapshot time
computed_at timestamp UTC time of this snapshot