This sound like projecting data into the linear space spanned by {x_i, x_i*x_j} where x_i are the features variables, and then applying standard regularization methods to remove noise and low value coefficients.
Anisotropy and the cone ideas may explain why PCA underperforms, but it does not uniquely justify this particular quadratic decoder. The geometric story is not doing explanatory work beyond “data is nonlinear,” and the real substance is simply that second-order reconstruction empirically helps.
Geometric Algebra (GA) (Clifford Algebra) also has high potential to transform neural architectures. Models like the Geometric Algebra Transformer (GATr) and Versor (2026) demonstrate it can enhance or even make the Attention Mechanism obsolete.
By representing data as multivectors, translational and rotational symmetries are encoded natively which allows them to handle geometric hierarchies with massive efficiency gains (reports of up to 78x speedups and 200x parameter reductions) compared to standard Transformers.
> A novel sequence architecture is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of traditional linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the manifold and evolving them via geometric transformations (rotors), Versor natively represents -equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN).
My understanding after scanning the code examples is the technique expands the dimensionality of each data point with a set consisting of the quadratic coefficients of its existing dimensions. I thought it sounded like kernel PCA.
this looks awesome. i've been struggling with vector compression, and have been trying PCA + all sorts of rotations. looking forward to trying this out
In the article, you mention this approach requires no search over hyper-parameter, because the method comprises a closed-form solution with "simple" linear algebra. I agree with this, but do you not in think need to tune the L2-regularization strength? That would for me be a hyper-parameter you would need to do a CV over (or similarly).
You should benchmark the retrieval speed of each method in terms of queries per second. I suspect that the gain in bandwidth you get from slightly better compression will be defeated by decompression being much more expensive.
Cool idea. But it only works when the data never changes.
could you make a streaming/incremental version? One that updates the math cheaply when new data arrives, instead of recomputing everything, or does the math fundamentally prevent it?
I'm just a casual LLM user, but your description of the anisotropy made me think about the recent work on KV cache quantization techniques such as TurboQuant where they apply a random rotation on each vector before quantizing, as I understood it precisely to make it more isotropic.
But for RAG that might be too much work per vector?
I came here from a discussion about CS students who should not be bothered to set up email filters. How can they ever expect to be able to digest just the first paragraph in that article?
This kind of attitude isn’t long for this world. The time for this knowledge locked up in academia is over and in 15 years we’ll look back on this time as the dark ages as open code and models eclipses it.
None of this stuff is as difficult to understand as people claim it is once you work with it.
FWIW I found it quite straight forward. But then I did have some linear algebra back at uni.
That said I do think it's a good habit to either write out abbreviations in full or link to say Wikipedia, eg for PCA[1]. It's a well-known tool but still if you come from a slightly different field it might not ring a bell.
Anisotropy and the cone ideas may explain why PCA underperforms, but it does not uniquely justify this particular quadratic decoder. The geometric story is not doing explanatory work beyond “data is nonlinear,” and the real substance is simply that second-order reconstruction empirically helps.
By representing data as multivectors, translational and rotational symmetries are encoded natively which allows them to handle geometric hierarchies with massive efficiency gains (reports of up to 78x speedups and 200x parameter reductions) compared to standard Transformers.
> A novel sequence architecture is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of traditional linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the manifold and evolving them via geometric transformations (rotors), Versor natively represents -equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN).
https://arxiv.org/abs/2602.10195
But for RAG that might be too much work per vector?
None of this stuff is as difficult to understand as people claim it is once you work with it.
That said I do think it's a good habit to either write out abbreviations in full or link to say Wikipedia, eg for PCA[1]. It's a well-known tool but still if you come from a slightly different field it might not ring a bell.
[1]: https://en.wikipedia.org/wiki/Principal_component_analysis