Codex logging bug may write TBs to local SSDs

(github.com)

93 points | by vantareed 2 hours ago

19 comments

  • b--l 2 hours ago
    Codex is one of the most infamous examples of slopware. Just having the window unhidden on my mac will cause it to use 100% of the GPU displaying the spinner message.

    THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

    So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).

    The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.

    There are many discussions about which model is better, or if vibe coding is even possible. I point you to the extent of what one of the most well funded, money flush, well staffed model making companies can do with vibe coding.

    To me a screwup this bad (where the CEO has already made it clear they're now "focussing on coding") indicates that there's something truly broken in the company. No one on polymarket expects them to have a leading model any time soon for example.

    It's a tragedy. The world needs competition to anthropic.

    • jofzar 54 minutes ago
      > Codex is one of the most infamous examples of slopware

      Woah, let's not forget Claude code is right there

      • mvATM99 33 minutes ago
        Yeah exactly.

        I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

    • xpct 49 minutes ago
      Well thank you for your service. I thought about trying out Codex after the disaster that is Claude Code. I'll be fine without either one on my machine
      • jofzar 31 minutes ago
        Imo codex is significantly better then Claude code for me ATM.
      • comboy 26 minutes ago
        I mean, Codex CLI is really bad. But Claude's CLI is so much worse.

        Welcome to the world of tomorrow!

    • nicce 24 minutes ago
      Not only Codex, but I can't leave ChatGPT app in macOS open for few hours, because it will consume 60 gigabytes of RAM over time and crashes all the apps.

      Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.

      Need to write own app for everything???

      • porridgeraisin 18 minutes ago
        the damn chat.openai.com webapp lags a lot as well on long chats, typing takes so long.
    • l33tman 1 hour ago
      This was fixed long ago, if I'm thinking of the same bug. It was stuck in an inf loop all the time the codex window was open.
      • cncjvu7 1 hour ago
        Nah it's still doing weird shit. Uninstalled that crapware last week.
    • seviu 17 minutes ago
      To be fair with Codex, you can use any harness you want with it. Access is not gatekeeper by a crappy full of slop electron app.

      So just move to PI, or whatever.

      Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.

      Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.

      Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.

    • hokkos 24 minutes ago
      is it closed source ? i can see the rust code in repo contrary to the JS in claude code repo, are you mixing them up ?
      • nicce 11 minutes ago
        Codex CLI is the main Rust code. There is Codex Desktop separately, using Electron and the same Codex CLI.
  • purpleidea 2 minutes ago
    I want to like codex, but the quality is just not very good, especially when compared to Claude.

    It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.

    No response, no workaround.

    https://github.com/openai/codex/issues/23762

  • woadwarrior01 2 hours ago
    Someone posted a temporary workaround for this on X[1].

    sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"

    Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].

    [1]: https://xcancel.com/bdsqlsz/status/2067964486615810369

    [2]: https://xcancel.com/jeethu/status/2068087449469780434

  • jofzar 31 minutes ago
    This is actually such a classic blunder (shipping trace/debug logging on for everything), but funnily the impact is not in a normal way.

    It's crazy we have hit a point where memory, CPU speed and disk speed isn't getting clapped because a Dev shipped logging at trace level instead of what used to the application being catastrophically slow so its immediately fixed in the next update.

    • kuekacang 22 minutes ago
      It helps too that agent work is done server side so you can hog all the local resources for your thin client.
  • neuralkoi 1 hour ago
    Vibe coding takes "move fast and break things" to a whole nother level.
    • comboy 25 minutes ago
      We are running out of things to break.
    • cryo32 1 hour ago
      Yeah. Here I am sitting on a major incident at our company because someone’s vibe coded shit went seriously wrong.
      • Imustaskforhelp 48 minutes ago
        Can you talk more in detail if possible and are allowed to do so?

        I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

        They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.

  • i2km 1 hour ago
    Shocking. Been open a week and AFAICT just silence from OpenAI. I just find it baffling. You'd think that these vendors would be very sensitive to this sort of issue. I mean, surely they have multiple agents hooked up to github monitoring potential issues and proposing fixes, right? ...right?

    Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...

  • abihordun 28 minutes ago
    SQLite + unbounded TRACE logs = firehose in a bathtub. No rotation, no cap, no surprise. The RAISE(IGNORE) fix patches a design flaw. OpenAI's silence is worse than the bug.
  • bob1029 1 hour ago
    I'm struggling with how this much logging information could be generated at any level of verbosity. Is codex writing log entries while it's sitting idle? Why would someone want to look at these logs?
  • taosu_la 40 minutes ago
    Can someone tell me if the current sub-agent of codex is available now? There used to always be a spinning issue.
  • dundercoder 2 hours ago
    If something like this is helpful or necessary, that’s what ram backed tmpfs is for.
    • mrweasel 1 hour ago
      Using a RAM backed tmpfs would be a work-around as to not trash your SSD. It's doesn't fix underlying problem. It's incredibly poor design on OpenAIs part.
  • hun3 1 hour ago
    The operating system has historically trusted the applications not to do dumb things too much.

    Only now we're witnessing the consequences much more frequently thanks to accelerated slop.

  • ares623 2 hours ago
    i hope they find the smoking gun, the key insight, the kicker.
    • 59nadir 2 hours ago
      Then they can apply a clean solve, the cleanest solution.

      It's fascinating how offensive some of this verbiage becomes to you when you see it attached to LLM output too much.

      • jofzar 51 minutes ago
        Ugh this one's gets me so bad, same with "wire" and "wired" everything is wired to something.
  • ramon156 2 hours ago
    Blegh, I puke every time I see obviously AI generated comments in GH PR's. You cannot assume any of these people have done their research, other than telling Codex to do it for them
    • b--l 2 hours ago
      It's because they use gpt-5.5-xhigh (the money making* model) to build it.

      (*for them)

  • consp 2 hours ago
    Why didn't the review process spot this obvious error? Oh wait ... @codex review this
    • cedws 1 hour ago
      Moreover why isn't the bug fixed already? I thought programmers were obsolete now. Surely one of the leading AI labs has figured out full automation of software development end-to-end by now if that's so.
    • charcircuit 2 hours ago
      Because it's not an error. The software is working as the creators intended. The diagnostic data (trace logs) are intentionally being saved for debug purposes.
  • rvz 2 hours ago
    The first of many bugs that are beyond the complexity of its authors, thanks to comprehension debt.

    Even with tests, the more complex the code base is, the more risky it is to vibe-code on it without introducing more bugs [0] and increasing the debt. Does not matter if the CI is green or if all the tests pass.

    It gets even worse if you can't explain the change / pull request or what the implications are after applying that "suggested" fix.

    [0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

    • HPsquared 1 hour ago
      There are going to be sooooo many consulting opportunities after this wave.
  • indiv0 2 hours ago
    This thread will become a typical "haha slop company made slop" but I've been bitten by a bug exactly like this before in a (pre-AI, artisan) OSS project. The maintainer there didn't properly account for DST when calculating last backup time, so the app started and never stopped writing/re-writing backups continuously.

    Perhaps the framing shouldn't be "haha slop" but rather why doesn't the AI write better quality software than we do? To which the answer is obvious IMO -- even emergent properties can't elevate AI intelligence too far above the training dataset. So how do we get to superintelligent (or at least "not-wreck-your-NVMe-endurance-telligent") AI, if we, as a whole, are not smart enough ourselves?

    Judge not the slop-bot, lest ye be judged yourself, engineer.

    • sleples 1 hour ago
      We've gone from "you're holding it wrong" to "the training data was bad because humans suck too". Difference is, humans learn from their mistakes.
      • SilverSlash 1 hour ago
        > Difference is, humans learn from their mistakes.

        Great! So next time the human will prompt the agent to watch out for and avoid this bug.

        • ponector 1 hour ago
          You are a senior developer. Please do no mistakes!
    • xpct 44 minutes ago
      Lack of accountability is the cause here. People don't think before hitting the 'Publish' button. Their managers let them off the hook because the culture still allows making egregious mistakes, as long as there's an LLM to blame.
    • applfanboysbgon 1 hour ago
      1. I bet that developer only made that mistake one time in their life. Humans learn from their mistakes, LLMs don't. If you rely on LLMs to generate all of your code, you can expect to run into the same issues again and again.

      2. "One developer somewhere in the world made a bad mistake one time, so this represents the quality of all software devs everywhere". Maybe they were just a bad developer? Bad developers exist. I have never written a bug that has destroyed my users' hardware, and I think that writing such a bug is completely inexcusable in an enterprise environment with software that will be shipped to millions of users, as Codex is.

      • matharmin 1 hour ago
        LLMs do learn from mistakes. Not as directly from individual mistakes like humans do, but in aggregate the models have improved much more in the last year than most humans I know learn in the same time.
        • xpct 38 minutes ago
          I don't like the reframing of 'learning from mistakes' from a human-like, near instantaneous feedback loop, to a year-long process of retraining on many traces collected from user data. They're different concepts and we should refer to them using different phrasing.
      • lifthrasiir 1 hour ago
        > I have never written a bug that has destroyed my users' hardware, ...

        Probably whoever (human or agent) originally decided to put TRACE logs into SQLite also thought---or reasoned---so. Maybe the decision was right at that time but the amount of TRACE logs have increased enormously. You will never know.

        • applfanboysbgon 1 hour ago
          I love that we've moved the goalposts from "LLMs are better than artisanal software engineers" to "actually, shipping hardware-destroying bugs in production is literally unavoidable, nobody could possibly avoid doing it".
          • lifthrasiir 1 hour ago
            I only meant what I said. After all the OP's thesis was that LLMs aren't better than artisanal software engineers, are they? There was no goalpost to move at least in this particular thread. And the solution might be another agent monitoring those oft-ignored signals.
    • da_grift_shift 1 hour ago
      What are your thoughts on the SNR of the linked GitHub issue threads? Consider the volume of comments posted and the substance of each comment.
      • fn-mote 49 minutes ago
        I read the first page and they were excellent. Each was clearly written by an experienced dev who knows how to substantiate their claims and propose an acceptable fix that could just be merged.

        Your comment, on the other hand, would be improved by including your own opinion on the matter.

  • Imustaskforhelp 2 hours ago
    I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

    One can argue that these products are the flagship products of their respective AI companies aside from the AI models themselves of course.

    I imagine that this story will be picked up by the news left and right, some stories just feel this way and this one is like that (given 12 upvotes on HN in 7 minutes)

    The only logical conclusion (from this incident) that I can have is: An (vibe-coded?) product is hard to maintain even for some of the best engineers and is bound to have severe bugs.

    2. Proper testing and taking issues seriously is the key if you still wish to do this and there isn't much. This is a week old issue which I can only classify as severe.

    I wish to keep an nuanced opinion about it but oh this is bad for openAI (not as bad as them accepting autonomous AI within drones and mass surveillance though)

    My point is: AI has both uphills and downward valleys and cliffs. It might as well just accelerate you, which could be, towards your downfall as well. Its recommended to keep an eye while driving and not drive too fast.

    AI companies might be like car companies which don't offer a brake pedal.

    • dathinab 2 hours ago
      > I don't understand how Codex can blunder so badly.

      because they trust the AI too much (and seem to be fin with acting knowingly negligent)

      the problem is

      - AI tends to produces very convincing looking code, even if fully wrong

      - AI does mistakes of kinds no human would do, at least no human who is also able to write convincing looking code

      - code reviews are hard, a lot of devs, including senior devs, put a lot of implicit trust into the co-worker behaving "sane and non malicious". But AIs behave sometimes not so sane and in a way (wrt. trying to be convincing). In the worst case in ways which if it where a human you might consider to be them trying malicious sabotage the product

      Like a "dump" example from work:

      - AI randomly removes a HTML element id while doing other changes in jsx/react

      - the PR has a lot of changes, the id removal line looks innocent, like some on the fly cleanup

      - human reviewers have the bad tendency to often not look too much at deleted lines, only if they need reference to how a new line was before (but it's only a deleted line and no new line)

      - you don't expect humans to randomly without reason delete important properties of components when changing other things

      - you maybe would still have found it, but it's a emergency fix for a production issue

      - it happens to miss integration tests, but happens to still matter a lot for one specific important for complicated reasons not properly tested flow (similar people tend to not test logging too much, at best the presence of needed info but hardly ever the absence of noise)

    • PunchyHamster 2 hours ago
      > I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

      Because it was deemed not Hard Enough task for real engineer to look at, so AI was sent to do it with no supervision, just checking the effects.

      Also overly excessive logging is probably useful to them in chasing some of the edge cases, the cost to users doesn't matter in the slightest to them

    • supriyo-biswas 2 hours ago
      The truth of the matter is that any time that has been saved in writing the code must be spent on ensuring proper system design, reviewing the code, and most importantly of all, QA, which is an uncomfortable discussion for AI techbros who are peddling complete automation of the software profession.
  • vantareed 2 hours ago
    [flagged]