"They're made out of weights"

(maxleiter.com)

243 points | by MaxLeiter 6 hours ago

16 comments

gobdovan 0 minutes ago
You can take the weights and model description, write them down on a notebook, then, by hand, compute the next token. Try to do the same with meat.
ProllyInfamous 15 minutes ago
Imagine writing something so incredibly brilliant (rather: adapting from the original) that it's entirely unlikely that you'll ever write something so incredible ever again.
But congrats: this is absolutely & incredibly brilliant.
Can't wait for the Jon Benjamin voiceover.
[-]
- lloeki 7 minutes ago
  They're Made out of Meat
  - Terry Bisson, 1991
  https://web.mit.edu/people/dpolicar/writing/prose/text/think...
  Radio play by Miriam Tolan and Russ Armstrong:
  https://www.wnycstudios.org/podcasts/studio/segments/168264-...
  (EDIT: the original parent was missing "rather adapting from the original")
  [-]
  - ProllyInfamous 1 minute ago
    Your EDIT is untrue, but thanks for linking to originals.
  - rendall 2 minutes ago
    Video adaptation.
    https://youtu.be/T6JFTmQCFHg
kami23 2 hours ago
This read like poetry to me. Thank you for sharing it.
I have a linguistics background and a lot of my philosophizing lately has been on whether or not the emergent abilities of the LLMs is deep down a similar mechanism that creates our consciousness.
For a little bit I was working on having linguistics based evals for a kaggle competition. My challenge was whether or not I could mask things well enough to not trigger its internal state of certain phenomena, and that sent me down a rabbit hole that I'm still exploring.
This story resonated with a lot of questions that can come out of figuring a good solid answer to the what is consciousness question. The one I triggered for me is: Is our perception of time just a slow thread in the giant GPU we are running the universe on? Or more generally, what is time? That's a fun YouTube rabbit hole if you ever need one.
[-]
- wisty 13 minutes ago
  AFAIK every argument against conciousness being emergent is just a weak "God of the gaps" argument (since we don't fully understand it all) or a nonsense analogy like the Chinsese room where if you seperate the hardware and software it's not concious anymore (like, duh, remove a brain from a body and it is no longer concious either).
  Yeah, the weights not updating online makes them less like a living organism that can update and learn and evolve ... ok ....
- kridsdale3 2 hours ago
  Time is entropy unfolding as things with nonzero temperature do what they do.
  Psychological time is your own weights being updated in response to stimuli and internal processing.
  When there isn't anything interesting happening, no updates are needed, and you don't perceive much time. That's why there's a logarithmic effect on the "density" of time as you age.
  [-]
  - hippich 2 hours ago
    This is actually something I was always confused about. If nothing interesting happens as we get old, it should be boring and as result, slow slog. Yet it feels like time accelerates as I get older.
    [-]
    - pixl97 1 hour ago
      Myself I believe the opposite. The brain itself is one of the most powerful filters that exists, and it attempts to be lazy and fill things in and compresses away the common. All that time we're not doing anything novel just gets compressed away to almost nothing. When you're a kid and seeing new things, feeling new things, learning new things you can't compress that away.
    - NDlurker 46 minutes ago
      Novel experiences take up more processing power and are burned into memory so they're experienced at a slower rate. That's how I understand it anyway.
    - agumonkey 1 hour ago
      It's coherent. More newness => more memories per period ~ slower to go through. Less newness => less memories ~ nothing to go through (faster sense of time)
- eszed 2 hours ago
  Yeah, I currently suspect that consciousness is an emergent property. I read elsewhere (it's somewhere in my HN history, I'm sure) that the biggest compute we can currently muster is something like three or four magnitudes away from the number of neurons / connections (or their analog) that our brains have, so it may be a while until we can expect to see it in our machines. But, if the emergent phenomenon hypothesis is correct, then we eventually will. I'm more scared than pleased by the prospect, but there you are.
  [-]
  - bulbar 47 minutes ago
    Just to be sure: The "neurons" in today's AI have nothing to do whatsoever with real neurons.
    What we can do is simulate very simple brains by simulating relatively few neurons as they appear in worms. In this sense we are multiple magnitudes away where the increasing complexity implies exponential increasing difficulty.
    I would think we are so far away that there will be unknown unknowns we encounter on the way.
    [-]
    - vkazanov 20 minutes ago
      Yes, physically absolutely nothing. But conceptually they seem to to form this very generic function from inputs to outputs that neurons also form.
  - ProllyInfamous 29 minutes ago
    >consciousness is an emergent property
    You would really like Michael Pollan's latest book [1], entirely devoted to his exploration of consciousness researchers' POVs on this exact topic.
    My favorite quote is that ~"perhaps Descartes was only half-wrong when suggesting I think, therefore I am; it seems rather closer to I FEEL, therefore I am."~
    [1] <https://www.amazon.com/World-Appears-Journey-into-Consciousn...>
    ----
    I've grown thousands of plants; I've read two of the author's other books devoted to plants; in this book Pollan makes compelling arguments for plant sentience (over a much-longer timeframe).
    Sure, perhaps plant consciousness is a bit of a stretch, but they're certainly intelligent and curious creatures. He makes both arguments supporting plant volition.
    ----
    If you haven't seen My Octopus Teacher (Netflix), do. I'm a bald 275lb bluecollarguy... and I wept/awed (both). So beautiful, we bundled neurons.
    ----
    Bonus quote ~"color is where reality and magic appear as-if together"~ [color isn't real, but is perceived]. We most-often see what's most-predictable, not necessarily what we actually detected [in the case of color: nothing but nanometers].
  - kevin_thibedeau 1 hour ago
    Our machines won't have biological systems driving their needs which in turn fuel behaviors like desire and planning for the future. They may imitate them but it won't be innate.
    [-]
    - trick-or-treat 57 minutes ago
      For an LLM, "innate" means "in their training data". So yeah, those things are pretty much innate.
    - lotyrin 1 hour ago
      I think those are things human consciousness has, not is.
  - slopinthebag 1 hour ago
    This is not meant as a gotcha, I am genuinely curious how you believe consciousness can be an emergent property. I assume you don't believe consciousness is a physical property in the brain, so what entity is actually experiencing that consciousness? Or, what does it even mean to experience consciousness? Or are these not even the right questions?
    [-]
    - doctoboggan 49 minutes ago
      > This is not meant as a gotcha, I am genuinely curious how you believe consciousness can be an emergent property.
      I was about to post the exact opposite question? How could it not be an emergent property? Unlike consciousness, the concept of emergence is pretty well defined: An emergent property is a characteristic or behavior that a complex system has, but which its individual components do not have on their own.
      Consciousness itself doesn't have a well agreed upon definition, but I would posit that _most_ people would agree humans have it, and _most_ people would agree individual cells (neurons) do not have it. If you agree with those two statements, then consciousness is an emergent property by the definition I gave above
      [-]
      - throwaway173738 15 minutes ago
        I think consciousness is going to turn out to be very challenging to define rigorously enough that we can test for its presence or absence. Emergent or not, the question is how do you determine when it has emerged? Is it a quantity or an attribute? Discrete or continuous? Does it have a finite or infinite range?
        We can all agree on what color something is, but we can’t describe the color a priori, only by example. I think consciousness may be a similar phenomenon and the only test is by shared experience. If so then we are in deep trouble because we will not be able to anticipate when a system becomes conscious.
      - bulbar 43 minutes ago
        I think the alternative is that our brain, somehow, is connected to some metaphysical aspect of reality which is what most religions believe.
        [-]
        kelseyfrog 28 minutes ago
        Another alternative is that consciousness exists on the map, and unfortunately we're confusing that with the territory.
      - eternauta3k 37 minutes ago
        I don't know if most people would dismiss the sentence "all matter has a sort of proto-subjectivity very different from ours but which gives rise to ours". And it solves some problems (introducing others).
        [-]
        doctoboggan 30 minutes ago
        Panpsychism is certainly an interesting idea but I wouldn't consider it a popularly help view.
    - eszed 39 minutes ago
      Not a gotcha at all, but I don't have a satisfying answer, nor am I confident there even is one. Best I can do is to say that I think consciousness and sense of self are at the very least closely related, and perhaps the very same phenomenon. "I" am the entity that realizes my own consciousness; consciousness is the qualia that makes "me" separable from all other entities.
      Or something like that. This gets to the "dorm room bullshitting" level right quick.
      [-]
      - slopinthebag 23 minutes ago
        Yeah, I guess what I'm trying to ask is that if it is an emergent property but not a physical part of the brain, doesn't that imply something metaphysical about consciousness? Almost as if it's a non-physical phenomenon? At least when I hear people talk about emergent behaviour I see it as a refutation of the spiritual, but to me it seems like it actually implies we have a "soul".
        Idk, it's really hard to articulate my thoughts here and yes it is pretty close to the conversations I had in college on various substances. Lol.
    - pixl97 1 hour ago
      Is a video game a physical property of a computer?
      [-]
      - bulbar 38 minutes ago
        We have general purpose hardware and we have hardware that's hard wired for specific purposes like ASICs and we have everything in between.
        And we are only doing it for a few decades. Evolution had million of years of "try and error".
      - slopinthebag 1 hour ago
        Yes
    - therealdrag0 1 hour ago
      Those are the questions and there’s stacks and stacks of philosophy pages written about it. Go have a whirl.
- BobbyTables2 1 hour ago
  I’ve wondered the same myself, without being a cunning linguist.
  I understand the math pretty well but still find it crazy that a bunch of matrices can converse in human languages without ever being “taught”.
  Imagine decoding an encyclopedia written in a foreign language where the characters, punctuation, and grammar are unknown — supplemented by a million other texts the same way. Feels like it should be utterly impossible with any amount of computing power…
  Today I asked my employer’s Claude to proofread a short software user manual written in markdown. (Trying this with a LLM was a first for me!) It pointed out not only grammar mistakes but also cases where I did not follow my own self-imposed conventions that were never explicitly stated. (I didn’t have a chapter detailing all the typographical conventions the way specification documents often do)
  I also asked it what parts might be unclear to a user. The response was surprisingly good — no worse than asking the QA tester for the same feedback.
  Also find the LLM seems to “comprehend” subtle technical details of obscure technical specification documents that nobody on the Internet ever discusses.
  As for time and the universe, Stephen Wolfram’s theories seem intriguing. He seems a bit obsessed with pretty diagrams but the idea of time dilation being the result of computation seems somewhat more appealing than trying to imagine relationships between time, gravity, and the speed of light .
  [-]
  - agumonkey 57 minutes ago
    My best guess as a noob is that the vector spaces allow for unbounded contextualization. As long as the training set is large enough, it can 'infer' anything.
    Proofread has a spot in that space, and layers allow patterns like terminology consistency to be expressed so your query will now tap into a subspace that will infer tokens based on whatever consistency patterns were ingested with proofreading texts.
  - Obscurity4340 1 hour ago
    If time dilation is said to being a product of computation, why is it that anaesthetic drugs that are taken not to the point of actual unconsciousness cause it. Dont anaesthetics sort of shut everything down/inhibit all that kind of cognitive activity (compute?)
    [-]
    - anon84873628 43 minutes ago
      "Time dilation" in this case is referring to the physics phenomenon from Einstein's special relativity. Not human perception.
- gostsamo 40 minutes ago
  Read the original mentioned on top for full effect.
noosphr 3 hours ago
It's not often I see something that's fractally wrong but here we are.
There is a dictionary, it's called the tokenizer.
There are grammar rules, they are just very weak because the structure of human language is generally quite weak. When presented with languages which have strong consistent grammars the weights are very easily interpretable as a grammar: https://arxiv.org/abs/2201.02177
The point of the original short story is that the computational substrate doesn't matter when you have Turing completeness. This one seems to think that you don't need structure and interpretability just because you change substrates.
[-]
- bfung 1 minute ago
  [delayed]
- famouswaffles 1 hour ago
  >There are grammar rules, they are just very weak because the structure of human language is generally quite weak. When presented with languages which have strong consistent grammars the weights are very easily interpretable as a grammar: https://arxiv.org/abs/2201.02177
  That paper did not train the models on 'a language with strong consistent grammars'. Mathematical Operation tables are not a language. Grammar itself is a post-hoc rationalization and there's no evidence LLMs follow 'grammar rules' anymore than the brain follows grammar rules. Of Course, that's not to say transformers can't learn simple rules if the dataset calls for it.
- glitchc 2 hours ago
  > fractally wrong
  fractally or factually? You mean wrong on so many levels you need a fractal to capture them? If so, what if you could use a neural network instead?
- dpark 2 hours ago
  A tokenizer is not a dictionary any more than an alphabet is a dictionary.
  [-]
  - noosphr 2 hours ago
    The Chinese alphabet is very much a dictionary. All the major tokenizers are far larger.
    [-]
    - dpark 1 hour ago
      That doesn’t make any sense. A alphabet is a list of valid characters. A dictionary is not just a list. Even in a language like Chinese where individual characters carry meaning, a dictionary tells you what that meaning is. It’s not just a list of characters.
      Or to echo article, the dictionary is made out of weights.
    - simonh 1 hour ago
      A list of words isn’t a dictionary. What a dictionary adds over a list of words is all the relationships between the words needed to interpret them and use them, and all of that is in the weights.
    - canjobear 1 hour ago
      A mapping of Chinese characters to integers (like a tokenizer) would not be a dictionary. You’d also need definitions. At best it’s an index to a hypothetical dictionary.
- benlivengood 3 hours ago
  I don't think the grokking paper is a great argument for the difference between weights and meat. E.g. https://en.wikipedia.org/wiki/Cortical_Labs learning to play Pong.
  The tokenizer is, at best, a sensory mechanism as evidenced by 1) the random generation of the tokenization scheme, and 2) vastly different tokenization schemes produce virtually identical behavior. It'd be like if Noah Webster threw a bunch of movable type into a bucket (breaking some words in half) and then drew randomly to make the first English dictionary.
  EDIT; I was too cavalier with the comparison of tokenizer to sensory modality; my ultimate point is that direct byte-to-token transformers can achieve similar overall performance which to me makes a weights to meat comparison pretty straightforward, but the particular tokenizer in use certainly has a large impact on both efficiency and accuracy on specific problems (e.g. digit representation)
  [-]
  - anon84873628 8 minutes ago
    Comparing the tokenizer to sensory processing is a great analogy. That's exactly what your visual cortex and initial layers of the language center are doing: decoding visual representation of text into the internal neural representation.
    It's a learned mapping from one representation to another, not some semantic lookup against an exogenous source.
  - noosphr 2 hours ago
    I'm kind of stunned that someone is using my work to tell me I'm wrong. I wrote the code for the dish brain pong and encoding information was a huge part of what that experiment was about.
    So when I way that the grok paper and the pong paper fundamentally agree I have some idea of what I'm talking about.
    [-]
    - anon84873628 32 minutes ago
      If you're going to claim the tokenizer is a dictionary then it doesn't really matter what paper you wrote code for.
    - benlivengood 2 hours ago
      I might have misunderstood the point you are making. I read the original article as "weights are like meat", and so I'm confused by what you consider fractally wrong.
      [-]
      - noosphr 2 hours ago
        The point that when the rules the model learns are simple enough they stop being spread out over all the layers and become as easily interpretable as any expert system.
        It's just that the rules we feed in the model are extremely poorly defined and we end up with the soup of disjoint rules smeared all across the weights.
        This isn't a feature of the models. It's a feature of the training set.
        Being shocked that you can store rules in floating point numbers is the same as being shocked you can store rules in integers. It's been a century since Goedel Numbering was invented, we should be used to it by now.
        [-]
        throwaway173738 2 minutes ago
        So basically there are rules, we just can’t articulate them and so we can’t decode them from the weights. The Goedel Numbering metaphor is pretty appealing to me. You can represent any finite series of real numbers with a series of computations performed on some other finite series of real numbers. We just happen to be using matrices because the math is easy to parallelize. The trick is to realize that when you know the sequence you have and the sequence you want then you can compute the calculations. If you constrain the calculations to only matrix multiplication then you arrive at the scheme we have.
        simonh 1 hour ago
        Right, but all of that is still in the weights. The point of the article/joke isn’t literally that there is no grammar, it’s that there is no grammar separate from the weights. It’s all in the weights. And yes, it’s absurd. It’s a joke, but a thought provoking one.
    - js2 2 hours ago
      https://news.ycombinator.com/item?id=35079
    - ufocia 2 hours ago
      Hubris much? I don't see a necessary contradiction in using someone's work to disprove another aspect of that same person's work.
- phito 35 minutes ago
  Also there's a brain, the GPU
  [-]
  - anon84873628 15 minutes ago
    Not at all. A brain is interesting because it is the computer, memory, and weights all in one. A GPU is just the calculator.
    You can't move your mind to and any other brain, but weights can run on any GPU.
- throw310822 2 hours ago
  > There are grammar rules
  And they're made out of weights.
  [-]
  - noosphr 43 minutes ago
    As opposed to integers in normal programming.
    The 'magic' in weights is that the rules are spread through the whole model and you can't point to one place which encodes them.
    The grokking paper shows that this stops being the case with enough training data and enough compute.
samrus 1 hour ago
I have to agree. It is messed up that transformers can just talk, and it been pretty normalized. We are only talking about the impact they will have and whether they can do what people say they can, but we arent talking about how crazy it is that they can talk
[-]
- modzu 1 hour ago
  if youve ever seen a pile of wrinkly mush and wondered.. pretty damn crazy too
  https://web.mit.edu/people/dpolicar/writing/prose/text/think...
eclipticplane 1 hour ago
The short film version of the original is great, too. https://www.youtube.com/watch?v=T6JFTmQCFHg
It stars Tom Noonan and Ben Bailey!
luca-ctx 1 hour ago
Truly fantastic bridge from the original, this deserves an award
[-]
- MaxLeiter 1 hour ago
  All credit to the original author. I just had to think of analogues.
satvikpendem 57 minutes ago
Great concept. It would've been even more amusing if the entire thing were generated with AI instead, ironically.
dvh 34 minutes ago
Will they have their own Jesus?
[-]
- kelseyfrog 22 minutes ago
  they have the spiral
pstuart 32 minutes ago
I couldn't help but grin like a fool reading this. Not only is it an artful parody but these thoughts have been thought.
Waterluvian 1 hour ago
It must have been kind of incredible early on to be exploring this tech and you’re suddenly getting what look like sentences.
turtleyacht 4 hours ago
Numbers that dream.
oofbey 2 hours ago
I love this. For anybody not getting the joke, it’s riffing on the classic 1990s essay “They’re made out of meat.”
https://web.mit.edu/people/dpolicar/writing/prose/text/think...
[-]
- tom_ 2 hours ago
  This original author is mentioned in the second sentence of the linked article, and then again in the third sentence, along with a link to the original story.
CSSer 3 hours ago
It works until they get to the sentience part. Neat idea!
[-]
- margalabargala 3 hours ago
  Even there it works a bit.
  > These models are the only other things we've ever met that can hold a conversation, and they're made out of weights
  Is a fair point.
  [-]
  - RodgerTheGreat 3 hours ago
    Not especially. Depending on where you set your standards for "holding a conversation" you can satisfy the requirement with a classical markov chatterbot, a well-trained parrot, a copy of Eliza, or a telemarketer flowchart drawn on a sheet of paper. Only the markov bot is made out of "weights" in the sense of a statistical model.
    Parrots are intelligent animals, albeit with a limited capacity for vocabulary and syntax compared to a human, and Eliza and the flowchart are made out of explicitly encoded rules and conversational tactics.
    [-]
    - margalabargala 2 hours ago
      The quality of "conversation" you can have with everything on your list is highly limited, and is categorically different than the sort of conversation you are able to have with any modern AI.
    - solenoid0937 1 hour ago
      Weights hold a better conversation at this point than the overwhelming majority of humans.
photochemsyn 17 minutes ago
No mention of ‘static’ vs. ‘dynamic’ is a bit disappointing in reference to the weights. Because you could argue that every neuron in your nervous system can be modeled as a collection of weights, firing likelihoods, receptor sensitivities, current dynamic state of that neuron - but LLMs are static collections of weights at inference time, with the dynamic adjustment of weights takes place at training time. So, just a ROM construct, like something out of Neuromancer, just trained on all written knowledge, not just one person’s total lived experience.
The above take fails in the real world because neuronal cells don’t exist in a vacuum; they are products of cellular development from a zygotic union of haploid contributors of sequential genetic information optimized for survival in an oxygen-rich biosphere powered largely by our local star that supports mammalian life (and microbial, plant, avian, etc.). Real AI would thus be AL - artificial life - as much as artificial intelligence. I don’t think you can have the one without the other, which upsets the simulationists who think an agent in the Matrix would be intelligent.
What either interpretation implies is that any real ‘artificial’ intelligence would be no more artificial than you or I, but it would have to dynamically update its weights at the same speed a human nervous system could (think how quickly we learn not to poke a cactus). For it to be at all trustworthy, then like a human, it would have to undergo a socialization process, one of the results of which is the development of a sense of embarrassment when it breaks acceptable social norms.
Hmm, this reminds me of the recent statement of the Pope about AI, of which I immediately thought, “Wait a second, aren’t there a fair number of people like this? The narcissistic sociopath profile, I think it’s called, a bit unfair to assume any real AI would turn out this way, isn’t it?”
Pope: “ Nor do they have a moral conscience, since they do not judge good and evil, grasp the ultimate meaning of situations, or bear responsibility for consequences. They may imitate or even simulate, but they do not understand what they produce, for they lack the affective, relational, and spiritual perspective through which human beings grow in wisdom.”