Claude Opus solved my white whale bug today that I couldn't find in 4 years
Background: I'm a C++ dev with 30+ years experience, ex-FAANG Staff Engineer. I'm generally the person on the team that other developers come to after they struggled with a problem for a week, and I would solve it while they are standing in my office.
But today I was humbled by Claude Opus 4.
I gave it my white whale bug which arose from a re-architecting refactor that was done 4 years ago. The original refactor span around 60k lines of code and it fixed a whole slew of problems but it created a problem in an edge case when a particular shader was used in a particular way. It used to work, then we rearchitected and refactored, and it no longer worked.
I've been playing on and off trying to find it, and must have spent 200 hours on it over the last few years. It's one of those issues that are very annoying but not important enough to drop everything to investigate.
I worked with Claude Code running Opus for a couple of hours - I gave it access to the old code as well as the new code, and told it to go find out how this was broken in the refactor. And it found it. Turns out that the reason it worked in the old code was merely by coincidence of the old architecture, and when we changed the architecture that coincidence wasn't taken into account. So this wasn't merely an introduced logic bug, it found that the changed architecture design didn't accommodate this old edge case.
This took a total of around 30 prompts and one restart. I've also previously tried GPT 4.1, Gemini 2.5 and Claude 3.7 and neither of them could make any progress whatsoever. But Opus 4 finally found it.
排序方式:
• 3天前
"It wasn't technically working either where you thought it was working" bugs are the WORST. Congrats. And I for one welcome our new robot overlords.
}
• 3天前
I guess we are going back to worshipping rocks again? 😭
}
• 3天前
Machines of Loving Grace
}
• 3天前
Now that you found the solution it would be interesting to see if you can get the other llm's to solve it.
}
• 3天前
i think this is the next logical step. i think if we have access to other models within claud code ecosystem i think you will get a similar outcome. i think what makes claude code great is not the model itself but the way claude code compliments the underlying model.
}
• 3天前
What is up with posts like this? I see so many people on all of the main AI subreddits saying shit like “ChatGPT saved my marriage of more than 25 years” or “If it weren’t for Gemini 2.5 Pro, I probably wouldn’t be alive right now.”
These posts feel so illegitimate and a lot more like advertisements than anything else. Are people from these companies creating fake Reddit accounts and made-up scenarios just to brag about how good each of their models are?
}
• 3天前
• 3天前
I feel you. But they don't necessarily need to be fake advertisements, some of them could be real. Some people in marriages could just need some kind of therapy or outside opinion that they couldn't get another way that the LLM gave them. Same thing for people with depression.
I think it's really hard to discern between fake and real but it could be either.
}
• 3天前
My Reddit account predates the founding of Anthropic by many years.
}
• 3天前
• 3天前
I find it interesting in a subreddit about Claude that someone with experience gives an example of how a new model has done something the old one couldn’t. I’m not sure what you’re looking for from here if that’s not to your taste.
}
• 3天前
Some people find help by LLMs in very difficult situations.
Be it hard to understand bugs in years old code, or relationships.
I know I did, several times already.
}
• 1天前
I support you, bro, isn't this solving a bug? Where is the bug? I know I don't have the right to know the code details, but it feels like marketing, it's too outrageous, do developers really record bugs like this?
}
• 1天前
• 13小时前
"This is a Claude-information subreddit which aims to help everyone make a fully informed decision about how to use Claude to best effect for their individual purposes."
}
• 3天前
you 10 hours ago: claude is the equivalent of a junior dev
you now: claude solved my white whale bug of 4 years
}
• 3天前
I maintain it’s still the equivalent of a Junior dev when it comes to writing new code.
However you also took that statement completely out of context. That wasn’t me saying that the model was inferior but asking why that guy would tell a junior dev to write code but not give him access to Google, Docs, or build tools (like he was doing to Claude).
}
• 3天前
Paradoxically both are true. Welcome to AI.
}
• 3天前
• 3天前
Have you never experienced a junior dev with fresh eyes solving a long standing issue? It is delightful and humbling. Also, by “fresh eyes” of course I mean “smarter than me, but inexperienced”.
}
• 3天前
absolutely. I have mentored many of them and relish in their growth. that's not what we're talking about here though
}
• 3天前
LLMs are making programmers obsolete. Learn the trades will be the next learn to code.
}
• 3天前
• 3天前
How about 4.0 Sonnet? Can it also solve this? in the performance chart, they are pretty close.
}
• 3天前
They're within a percent of each other aren't they? 72-ish?
}
• 3天前
I've been comparing Gemini and Claude on quite a few research prompts. It's not always the case, but Claude has some answers and explanations that make Gemini look like an old model. Gemini feels like it addresses the information in front of it without thoughts on peripheral issues, and Claude does a really good job of catching things that aren't explicitly stated.
But... before Claude 4 was released, I was almost exclusively using Gemini because it would give better results.
Crazy how quick things are flipping around in the AI space.
}
• 3天前
Sonnet or Opus?
}
• 3天前
Opus. Surprisingly, I seem to get better results when I don't use the deep research function first and just use the regular Opus.
}
• 2天前
"Crazy how quick things are flipping around in the AI space."
Yet, entirely expected when you factor in exponential growth.
I have been saying this the whole time. Here and on my Twitter account.
With exponential AI growth you will start seeing "once in decade" revolutions in AI every year.
Then once per month.
Then every week.
We are watching AI hit the positive feedback loop in real time.
}
• 3天前
>This took a total of around 30 prompts and one restart
Yes. This is the reality. It is an amazing tool, but it's not instant gratification. Though, after 4 years, ONLY 30 prompts may seem instant.
Good job, both of you.
}
• 3天前
Lol @ the downplay.
}
• 3天前
No doubt. And a few of those prompts were 1000+ line logs from all the printf statements it sprinkled throughout the code and wanted me to paste the results after testing. Either way, still a good outcome.
}
• 3天前
Debugging shaders is so freaking hard. I can only imagine the pain.
}
• 3天前
ngl 30 prompts sounds like a blink of an eye when you're going down a rabbit hole.
}
• 3天前
Yeah, a rabbit hole should take more than a day, maybe a week to get down into lol. 30 prompts is an afternoon.
}
• 3天前
And a few of those prompts were 1000+ line logs from all the printf statements it sprinkled throughout the code and wanted me to paste the results after testing.
I just paste the whole log & tell Claude to figure it out.
}
• 3天前
There’s not even context window to accommodate 60k+ lines of code
}
• 2天前
Ah one of the secrets! Let claude spew debug everywhere!
}
• 3天前
Did you run it ob your own machine?
}
• 3天前
Yes. Is there another way to run Claude Code?
I know they have GitHub integration now (though I haven’t tried it yet) but is that Claude Code?
}
• 3天前
Now doubt Claude is best platform fir coding
}
• 3天前
How much did it cost?
}
• 3天前
I'm on Claude Max, so it's a fixed $100 monthly rate. However having done previous sessions like this in Roo with Sonnet 3.7 and considering Opus costs 5x more, this would have been hundreds, easily.
}
• 3天前
Why did you never try O3?
}
• 3天前
Mostly just because you can't just use and pay for it via OpenRouter like you can with the other models. You have to bring in your own key from OpenAI and I didn't want to bother getting yet another one-off subscription.
}
• 3天前
So can u describe how you gave access to old code and then new code? How were you able to give the context of old code when it is running in the context of new code?
Is it probably like, you had a root folder which has both old and new repo folders and you started claude code at this root? So it has access to both folders? Sorry im a noob just trying out claude code, and im not familiar with how you can have 2 instances of claude code running in 2 different repos but sharing some context?
}
• 3天前
Sure, my natural project structure is /proj/src, and when I open VSCode I open to /proj. So it was simply a matter of copying an old version of the source to /proj/oldsrc so both were then under /proj, and I just had to tell Claude to look at it.
I also told it some files may have moved due to refactoring, which it did, but it had no problems finding anything.
}
• 3天前
I gave it my white whale bug which arose from a re-architecting refactor that was done 4 years ago. The original refactor span around 60k lines of code and it fixed a whole slew of problems but it created a problem in an edge case when a particular shader was used in a particular way. It used to work, then we rearchitected and refactored, and it no longer worked.
This recently happened to me after a ~8k line refactor on Electron. I spent a week on it & even AI couldn't fixed it. Then I refactored with AI & it fixed itself. Definitely would've been harder to do on my own.
I worked with Claude Code running Opus for a couple of hours - I gave it access to the old code as well as the new code, and told it to go find out how this was broken in the refactor.
How did you give it old code access? I usually use yek - GitHub - bodo-run/yek: A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption & put all the files (necessary context) in one yek.txt file but would love to know if there's a good way to go about it.
}
• 3天前
Check out repomix. It’s also available as a vs code plugin.
}
• 3天前
I just copied my old code folder next to the new one and pointed Claude at the common parent.
}
• 12小时前
Did you give it any specific prompts for refactoring or did you do most of the work with AI helping?
}
• 12小时前
Nowadays, I'm relying on speech-to-text using Talktastic which reframes my bad English vocab into sophisticated English vocab.
I don't think it matters for code but it does make a huge difference in writing with ChatGPT-4o.
So I do talk more a lot. It takes a lot getting used to as I've normally typed code for countless years but yeah use talking.
r/ChatGPTPromptGenius is one subreddit to watch out for (sort by top of month) & also Namanyay Goel has some good prompts (read all blogs) & also indydevdan on youtube.
}
• 11小时前
So do you actually speak the code you want to type needing to say things like open parentheses and brackets and indenting or can you talk about it more generically?
}
• 3天前
How much did that cost in terms of token usage?
}
• 3天前
I'm on Claude Max, so it's a fixed $100 monthly rate. However having done previous sessions like this in Roo with Sonnet 3.7 and considering Opus costs 5x more, this would have been hundreds, easily.
}
• 3天前
Thank you
}
• 3天前
Hi, sorry this is off topic but since Roo is mentioned, I really wonder your recommendation between those Roo and Claude Code. Do you think Claude Code is way better than using Roo? I'm using Roo for my personal projects and wondering about Claude Code, Codex, and Jules. I can test Jules because it's free right now but as for Claude Code and Codex there don't provide trials and I'm a bit skeptical if they worth the price. Since you're using both can you maybe do a comparison and recommend which one is better or best for certain situations? Cheers.
}
• 3天前
I prefer Roo.
I like the prompt customization better, I like the way it does approvals better, like the architect and orchestrator modes better and like that you can chat with the dev team and they’ll fix bugs within days. If there was a way to use Claude Max in Roo I’d use it instead.
But alas, Opus in Claude Code is $100 per month where in Roo it would be $3000 per month or more. If it was something like $500 a month uncapped I’d use Roo instead. But hard to justify more when an alternative is available.
}
• 3天前
Something that would be interesting for me (that hasn‘t tried any sophisticated coding LLM yet) would be whether a junior dev would have been able to find the bug with Claude as well?
I would assume because of your experience in the field that your words used for describing the issue to the LLM are different from the ones a junior dev would have used, let alone a junior dev that only solved problems using LLMs?
}
• 3天前
How can you be sure it was Opus that fixed the bug? In Claude Code, you have two choices: "Default" or "Sonnet 4." "Default" lets Claude choose the model, possibly based on your remaining usage limits. It's possible that both Opus and Sonnet 4 contributed to fixing the bug, especially if the "Default" setting was used.
}
• 3天前
My Claude doesn’t have a default mode to choose. It’s very specific for which one it’s using for me and cannot switch without starting a new chat
}
• 3天前
“Which model are you?”
I’ve not seen it return anything other than Opus 4 in the last week.
}
• 3天前
Sometimes it tells me it's Sonnet 3.5 but the version it shows is for Sonnet 4 😅
Yesterday I hit my limits and it switched back to Sonnet 4. I'm finding this part of the UX in Claude code a bit confusing
}
• 3天前
Do you have a per-token plan? I’m on Claude Max so I don’t know if that is what’s making a difference.
}
• 3天前
For me in default it basically has used Opus every single time as far as I can tell.
}
• 3天前
Thanks for the post, but OP what was the price for it I'm curious?
}
• 3天前
I'm on Claude Max, so it's a fixed $100 monthly rate. However having done previous sessions like this in Roo with Sonnet 3.7 and considering Opus costs 5x more, this would have been hundreds, easily.
}
• 3天前
We all tend to have some bias towards certain approaches and due to that tend to neglect/ignore other potential pitfalls in problems that would make them "unsolvable" to us, unless we manage to change our point of view.
It can happen everywhere, not necessarily just software development, and even in software development it could happen even with stuff we are familiar with yet haven't paid enough attention to certain things.
I wouldn't treat Claude 4 as some magical black box that solved you the problem. You've managed to make it recognize some issue that it couldn't on its own due to pattern recognition. In other cases the 30 prompts might lead you nowhere even with the same model, so all of the people who constantly write that the tech jobs are cooked are just being delusional.
}
• 3天前
How did you manage to let Claude code work with two codebase within the same context?
}
• 3天前
My natural project structure is /proj/src, and when I open VSCode I open to /proj. So it was simply a matter of copying an old version of the source to /proj/oldsrc so both were then under /proj, and I just had to tell Claude to look at it.
I also told it some files may have moved due to refactoring, which it did, but it had no problems finding anything.
}
• 3天前
How big was the prompt used? Just a few lines and the codebases attached or a long set of instructions?
}
• 3天前
Initial prompt was maybe 10 lines. I pointed it to the top-level codebase folder which is about a million lines. Well, two million if you count the old version of the project which was side by side to the new one underneath the parent folder.
The follow-up prompts range from 1 line to 1500 lines and contain logs that it wanted me to get after it added a whole bunch of printf’s to the codebase to understand the code flow.
The follow-up prompts that weren’t mostly logs had details like “you’re going down the wrong path - it doesn’t help restricting this conditional code <insert code> to only apply to a subset of the input dataset since <explain reason>, and this conditional <insert code> and that conditional <insert code> are not mutually exclusive in the case of <explain dataset scenario>”.
So I basically told it about previous paths that I went down when it wanted to also take those, but I knew would lead to dead ends.
}
• 3天前
So, how much the fix cost?
}
• 3天前
I'm super excited to be using Claude Code in this way as well. Basically being able to do what I "couldn't" after all these years. Congrats OP!
}
• 3天前
Claude 4 and opus 4 API cost is super expensive… has anyone checked? $50 api credit will survive 10 or less prompts.
}
• 3天前
Would be even less than 10 prompts in my case since my prompts contained 1000+ line log files etc. and I pointed it to a million line codebase to start off. Two of them actually if you include the old version.
However I’m using Claude Max so it’s a fixed $100 per month when used from Claude Code.
}
• 3天前
I code mainly for mobile (and sometimes web) using flutter inside vscode.
Currently I use Gemini 2.5 pro from website and everytime I need to ask something I create a new tab (I've only a long tab I use continuously to talk with gemini regarding my application architecture).
Which plan of Claude should I use? Currently I pay the 20$ plan, I want something similar, don't want to spend 200$+ monthly.
Is there a way to integrate AI inside my IDE in order to understand better my requests and create files and improving directly on my code?
}
• 3天前
Claude Code integrates great in VSCode (on Mac at least). And with Claude Code you can use the Claude Max plan which is $100 per month and I think currently the best bang for the buck.
Having said that I think RooCode is a better tool than Claude Code, but you’d have to use API tokens instead of being able to use a Claude Max subscription, which can easily run into the $1000s per month.
But of course with a combination of RooCode and OpenRouter you can use almost any AI model out there, so it’s really nice to be able to switch around quickly. But it’s all API-token based so it can get very expensive with some models.
}
• 3天前
Are you certain Claude Code is using Opus? I just did another post on how it is not using it.
}
• 3天前 • 3天前 编辑
I've asked it about 5 times "what model are you?" over the last week and it's returned "I'm Claude Opus 4, released on 2025-01-14" every time.
However there is no way to set that explicitly (I don't think), so it can probably return different results for different users.
}
• 3天前
30 prompts? How did you have confidence an answer would be found?
Any ideas if you could’ve shortened that prompting down given what you knew then?
}
• 3天前
I envy you for being able to use Claude Max. As a Chinese developer, Claude Max Plan is not available.
}
• 2天前
It’s not even trained on C++ specifically either which is amazing. Python is public and easily accessible but C++ is closed source
}
• 2天前
What happened to your statement of "its the equivalent of a junior dev"?
}
• 2天前
If you have the time and don't mind sharing, I'd love to hear more about those ~30 prompts and the overall interaction flow. Things like:
How did you initially frame the problem to Claude?
What kind of code chunks or files did you share at each stage?
How did Claude approach the analysis - did it ask for specific files, or did you guide it?
What was the breakthrough moment when it identified the architectural issue?
How did the conversation evolve when you mentioned the restart?
I'm particularly interested in understanding how Claude handled comparing the old vs new architecture and identified that coincidental dependency that got lost in the refactor. That kind of architectural level reasoning across a 60k line codebase sounds incredibly impressive.
No worries if you don't have time for all the details but even a high level walkthrough of the debugging process would be really valuable for those of us trying to get better at leveraging these tools for complex problems.
}
• 2天前
Initial prompt was maybe 10 lines describing the problem. I pointed it to the top-level codebase folder which is about a million lines. Well, two million if you count the old version of the project which was side by side to the new one underneath the parent folder.
The follow-up prompts range from 1 line to 1500 lines and contain logs that it wanted me to get after it added a whole bunch of printf’s to the codebase to understand the code flow.
The follow-up prompts that weren’t mostly logs had details like “you’re going down the wrong path - it doesn’t help restricting this conditional code <insert code> to only apply to a subset of the input dataset since <explain reason>, and this conditional <insert code> and that conditional <insert code> are not mutually exclusive in the case of <explain dataset scenario> like you are assuming right now”.
So I basically told it about previous paths that I went down when it wanted to also take those, but I knew would lead to dead ends.
Claude Code automatically found the files it needed to look in using grep. I didn't have to guide it - not even the function names. I generally make sure I start with all files closed in VSCode otherwise it becomes overly fixated on what you have open rather than doing wide searches.
It tried a bunch of things before it had the breakthrough, but as you probably know it always said: "I found it! This is finally the root case of the problem!" but every AI does that on almost every prompt, so it wasn't anything special. It was just another thing it did that I tested and noticed it worked without also regressing other stuff, and then I looked at it and compared it, and then realized what it did. Then I had to go and delete a bunch of other unnecessary changes that Opus also did that it insisted was good to leave in and it didn't want to remove, but wasn't actually pertinent to the issue.
When I restarted it was because it went on a side quest of "fixing" some matrix multiplication in the associated shader, and I didn't feel like spending the day doing linear algebra to figure out if it's correct or not. I didn't think that was on the correct track at all - the issue was that the shader wasn't getting executed, not that it behaved poorly. So I just restarted it and gave it back what it told me so far from one of the last results. I didn't specifically tell it to leave the glsl alone, but it did from then on.
I'm trying to distill the prompt to get it down to a single prompt that makes the change without giving away the fix, since it would be nice being able to use that to compare different models against each other. So far I haven't been successful. I can get it down to 3 prompts to make the change in the correct file though, but not the correct fix yet.
}
• 2天前
Thank you so much for such a detailed answer. I really appreciate it
}
• 2天前
Sounds like one of those concurrency bugs where the code used to work because of timing that just happened to avoid the provlem in a way that was entirely coincidental. The need fr ==Then the refactored code changes that coincidental timing and boom.
Im impressed tjart claude could track that with so relartively little work
}
• 2天前
How do you give it access to interact with the entire codebase?
}
• 2天前
You just cd to the top-level folder in the codebase and open Claude Code there. In my case I had to folders side by side under the same parent (proj/source and proj/oldsource), and then I opened proj itself in VSCode and ran Claude Code there.
}
• 2天前
Gemini just solved a week long problem for me, in just a few hours. I‘m pretty stoked to see what Claude Opus can do!
}
• 2天前
Is opus4 better than sonnet 4? Asking because I use sonnet4 and never tried opus4
}
• 2天前
I prefer sonnet for scripting
}
• 2天前
I’ve not really tried Sonnet 4 yet. Opus 4 is definitely better than Sonnet 3.7.
Opus in general is a much bigger (and more expensive) model than Sonnet.
}
• 2天前
How do you give Claude opus your codebase? Do you paste file by file?
}
• 2天前
You just point Claude Code (or Roo or Cline, or whatever you want to use) at the top level folder of your project and it finds the files it needs.
Typically it does so by running grep commands with various permutations of the problem, and then starts digging in from there.
}
• 2天前
I probably say "why did this ever work" to myself about once every 6 months
}
• 2天前
Sugoi.
}
• 2天前
Allow me to translate your post: I am a clever person that clevered his way through 60k lines of code. I've been outclevered by a tool that simulates a mediocre programmer. Great stuff.
}
• 2天前
Feels like the model's getting better at actually solving our real BUGs.
}
• 2天前
Turns out that the reason it worked in the old code was merely by coincidence of the old architecture, and when we changed the architecture that coincidence wasn't taken into account. So this wasn't merely an introduced logic bug, it found that the changed architecture design didn't accommodate this old edge case.
this explanation is deeply unsatisfying and frankly sounds like something claude itself would confabulate.
care to go into some real details of the bug and fix?
}
• 2天前
This is such an important post.
Less bc of Claude and more because of how this particular class of error (not even constrained to code) can eat way at entire chunks of life.
Great reminder that even experts are human, and sometimes things work because you get “lucky”—you don’t always know what you have until you lose it.
Solid life lessons in this one, thank you.
}
• 2天前
In total, how many files and lines of code did Claude have to read through in the old and current source folders?
}
• 2天前
I know it opened 12 files, which constitute around 10000 lines of code. Each of old and new. I'm not sure how many lines it read through vs. how much it found using grep.
}
• 2天前
厉害
}
• 2天前
how do you feed 60K lines of code? My limit maxed at 15-20K !
}
• 2天前
Not sure what you mean by “feed”. I just point Claude Code at the top of the code base (which has a million lines, not just 60k) and it finds what it needs. In this case it was actually 2 million lines since it had both the old and new copy of the source.
}
• 2天前
What kind of software system did you build?
}
• 2天前
What was the total cost in Claude 4 API usage to find this bug?
}
• 2天前
I'm on Claude Max, so it's a fixed $100 monthly rate. However having done previous sessions like this in Roo with Sonnet 3.7 and considering Opus costs 5x more, this would have been hundreds, easily.
}
• 2天前
Whatever, cant even confirm this is real, probably another story written by AI.
}
• 2天前
I’m a technical Lead PM and have been toying with “vibe” coding for a long time. Since Opus — holy f. I haven’t been able to sleep properly. We have a huge BESPOKE legacy monolith CMS hybrid subscriptions, user management. It has rat f’d our company into submission for years. I have built essentially an interfacing layer on top of it, and using Playwright have automated interacting with it. NOW, I’m just continuing and I basically building out new very valuable IP for the company.
Aside from that I’ve been creating some billion dollar industry disruptive stuff. I’ve maxed out 2 accounts each on Cursor and Windsurf (hugely feel the diff between 3.7 and 4.0 — not the same AT ALL)
}
• 2天前
Read this in xiaohongshu and I purposely downloaded Reddit to read this again! Am thinking to purchase Claude plus as a start or chatgpt plus....
}
• 2天前
I have been coding for more than 5 years, but now I can't code without cursor now.
}
• 1天前
It never worked to begin - classic,
}
• 1天前
Prove it. I'm tired of seeing posts like this with no supporting evidence. Otherwise, this is just more unfounded hype generation.
}
• 1天前
So, wait: 1) you knew how to reproduce it, 2) you've ALREADY spent 200 hours on it and 3) it was still not important to give it a good debugging/tracing session once that would pinpoint where exactly new architecture breaks it?
I don't know but it looks to me that some part of story is exaggerated.
}
• 1天前
You think I spent 200 hours without at least 190 of that spent debugging/tracing?
What exactly do you think I was doing?
}
• 1天前
I’ve been through something similar myself. While Claude 3.5 sonnet didn’t directly solve the bug, it sparked new ideas for me, and I ended up finding the solution on my own.
}
• 1天前
Did you try formal verification?
}
• 22小时前
30 prompts? Nah, far more than what I can tolerate.
}
• 20小时前
I'm always intrigued to find AI to give us better solutions especially for software engineering. Great share
}
• 15小时前
What IDE? Did you let it run a in a loop? (Curious about the “2 hours”. Assuming loop with “run tests to check if it worked or not”?
}
• 9小时前
crazy. We can pack it up