Does anyone feel like the biggest selling point of LLMs so far is basically for programmers? Feels like most of the products that look like could generate revenue are for programmers.
While you can see them as a productivity enhancing tool, in times of tight budgets, they can be useful to lay off more programmers because a single one is now way more productive than pre-LLM.
I feel that LLMs will increase the barrier to entry for newcomers while also make it easier for companies to layoff more devs as you don't need as many. All in all, I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
1. Developers are building these tools/applications because it's far faster and easier for them to build and iterate on something that they can use and provide feedback on directly without putting a marketer, designer, process engineer in the loop.
2. The level of 'finish' required to ship these kinds of tools to devs is lower. If you're shipping an early beta of something like 'Cursor for SEO Managers' the product would need to be much more user friendly. Look at all the hacking people are doing to make MCP servers and get them to work with Cursor. Non-technical folks aren't going to make that work.
So then, once there is a convergence on 'how' to build this kind of stuff for devs. There will be a huge amount of work to go and smooth out the UX and spread equivalents out across other industries. Claude releasing remote MCPs as 'integrations' in their web ui is the first step of this IMO.
When this wave crashes across the broader SaaS/FAANG world I could imagine more demand for devs again, but you're unlikely going to ever see anything like the early 2020s ever again.
Shift feels real. LLMs don't replace devs, but they do compress the value curve. The top 10% get even more leverage, and the bottom 50% become harder to justify.
What worries me isn't layoffs but that entry-level roles become rare, and juniors stop building real intuition because the LLM handles all the hard thinking.
You get surface-level productivity but long-term skill rot.
> juniors stop building real intuition because the LLM handles all the hard thinking. You get surface-level productivity but long-term skill rot.
This was a real problem pre-LLM anyway. A popular article from 2012, How Developers Stop Learning[0], coined the term "expert beginner" for developers who displayed moderate competency at typical workflows, e.g. getting a feature to work, without a deeper understanding of lower levels, or a wider high-level view.
Ultimately most developers don't care, they want to collect a paycheck and go home. LLMs don't change this; the dev who randomly adds StackOverflow snippets to "fix" a crash without understanding the root cause was never going to gain a deeper understanding, the same way the dev who blindly copy&pastes from an LLM won't either.
> Ultimately most developers don't care, they want to collect a paycheck and go home. LLMs don't change this; the dev who randomly adds StackOverflow snippets to "fix" a crash without understanding the root cause was never going to gain a deeper understanding, the same way the dev who blindly copy&pastes from an LLM won't either.
I read this appraisal of what "most devs" want/care about on HN frequently. Is there actually any evidence to back this up? e.g. broad surveys where most devs say they're just in it for the paycheck and don't care about the quality of their work?
To argue against myself: modern commercial software is largely a dumpster fire, so there could well be truth to the idea!
> I read this appraisal of what "most devs" want/care about on HN frequently. Is there actually any evidence to back this up? e.g. broad surveys where most devs say they're just in it for the paycheck and don't care about the quality of their work?
Almost every field I've ever seen is like that. Most people don't know what they're doing and hate their jobs in every field. We managed to make even the conceptually most fulfilling jobs awful (teaching, medicine, etc).
Complex technology --> Moat --> Barrier to entry --> regulatory capture --> Monopoly == Winner take all --> capital consolidation
A tale as old as time. It's a shame we can't seem to remember this lesson repeating itself over and over and over again every 20-30-50 years. Probably because the winners keep throwing billions at capitalist supply-side propaganda.
You could say the same sort of thing about compilers, or higher-level languages versus lower-level languages.
That's not to say that you're wrong. Most people who use those things don't have a very good idea of what's going on in the next layer down. But it's not new.
Developers (often juniors) use LLM code without taking time to verify it. This leads to bugs and they can't fix it because they don't understand the code. Some senior developers also trust the tool to generate a function, and don't take the time to review it and catch the edge cases that the tool missed.
They rely on ChatGPT to answer their questions instead of taking time to read the documentation or a simple web search to see discussions on stack overflow or blogs about the subject. This may give results in the short term, but they don't actually learn to solve problems themselves. I am afraid that this will have huge negative effects on their career if the tools improve significantly.
Learning how to solve problems is an important skill. They also lose access to the deeper knowledge that enable you to see connections, complexities and flows that the current generation of tools are unable to do. By reading the documentation, blogs or discussions you are often exposed to a wider view of the subject than the laser focused answer of ChatGPT
There will be less room for "vibe coders" in the future, as these tools increasingly solve the simple things without requiring as much management. Until we reach AGI (I doubt it will happen within the next 10 years) the tools will require experienced developers to guide them for the more complex issues. Older experienced developers, and younger developers who have learned how to solve problems and have deep knowledge, will be in demand.
> They rely on ChatGPT to answer their questions instead of taking time to read the documentation or a simple web search.
Documentation is not written with answers in mind. Every little project wants me to be an expert in their solution. They want to share with me the theory behind their decisions. I need an answer now.
Web search no longer provides useful information within the first few results. Instead, I get content farms who are worse than recipe pages - explaining why someone would want this information, but never providing it.
A junior isn’t going to learn from information that starts from the beginning (“if you want to make an apple pie from scratch, you must first invent the universe.”) 99.999% of them need a solution they can tweak as needed so they can begin to understand the thing.
LLMs are good at processing and restructuring information so I can ask for things the way I prefer to receive them.
Ultimately, the problem is actually all about verification.
> Documentation is not written with answers in mind. Every little project wants me to be an expert in their solution. They want to share with me the theory behind their decisions. I need an answer now.
I have an answer now, because I read the documentation last week.
As a real example, I needed to change my editor config last month. I do this about once every 5 years. I really didn’t want to become an expert in the config system again, so I tried LLM.
Sad to report, it told me where to look but all of the exact details were wrong. Maybe someday soon, though.
I used to make fun of (or deride) all the "RTFM" people when I was a junior too. Why can't you just tell me how to do whatever thing I'm trying to figure out? Or point me in the right direction instead of just saying "its in the docs lol"?
Sometime in the last few years I've started doing more individual stuff, I've started reading documentation before running npm i. And honestly? All the "RTFM" people were 100% right.
Nobody here is writing code that's going to be used on a patient on the surgical table right now. You have time to read the docs and you'll be better if you do.
I'm also a hypocrite because I will often point an LLM at the root of a set of API docs and ask how to do a thing. But that's the next best thing to actually reading it yourself, I think.
I'm in total agreement, TM does wonders. Even if you don't remember all of it you get a gist of what's gong on and can find things (or read them) faster.
In Claude I put in a default prompt[1] that helps me gain context when I do resort to asking the LLM for a specific question.
[1] Your role is to provide technical advice in developing a Java application. Keep answers concise and note where there are options and where you are unsure on what direction that should be taken. Please cite any sources of information to help me deep dive on any topics that need my own analysis.
Same could be said for every language abstraction or systems layer change. When we stopped programming kernel modules and actually found a workable interface it opened the door to so many more developers. I'm sure at the time there was skepticism because people didn't understand the internals of the kernel. That's not the point. The point is to raise the level of abstraction to open the door, increase productivity and focus on new problems.
When you see 30-50 years of change you realise this was inevitable and in every generation there's new engineers entering with limited understanding of the layers beneath. Even the code produced. Do I understand the lexers and the compilers that turn my code in to machine code or instruction sets? Heck no. Doesn't mean I shouldn't use the tools available to me now.
No, but you can understand them if given time. And you can rely on them to be some degree of reliable approaching 100% (and when they fail it will likely be in a consistent way you can understand with sufficient time, and likely fix).
LLMs don’t have these properties. Randomness makes for a poor abstraction layer. We invent tools because humans suffer from this issue too.
> it opened the door to so many more developers. [...] That's not the point. The point is to raise the level of abstraction to open the door, increase productivity and focus on new problems.
There are diminishing returns. At some point, quoting Cool Hand Luke, some men you just can't (r|)teach.
> Developers (often juniors) use LLM code without taking time to verify it. This leads to bugs and they can't fix it because they don't understand the code
Well... is this something new? Previously the trend was to copy and paste Stackoverflow answers, without understanding what it did. Perhaps with LLM code it's an incremental change but the concept is fairly familiar.
Aren't the insufficiencies of the LLMs a temporary condition?
And as with any automation, there will be a select few who will understand it's inner workings, and a vast majority that will enjoy/suffer the benefits.
So the scope of answers are single function or single class ? I have people nearby that are attempting generating whole projects, I really wonder how they will ensure anything about it beside the happy paths. Or maybe they plan to have an army of agents fuzzing and creating hotfixes 24/7 ..
> Or maybe they plan to have an army of agents fuzzing and creating hotfixes 24/7
There are absolutely people who plan to do exactly this. Use AI to create a half-baked, AI-led solution, and continue to use AI to tweak it. For people with sufficient capital it might actually work out halfway decent.
I've had success with greenfield AI generation but only in a very specific manner:
1. Talk with the LLM about what you're building and have it generate a detailed technical specification. Iterate on this until you have a good, human-readable explanation of the entire application or feature.
2. Start a completely new chat/context. If you're using something like Gemini, turn temperature down and enable external search.
3. Have instructions¹ guiding the LLM, this might be the most important step, even moreso than #1.
4. Create the base/blank project as its own step. Zero features or config.
5. Copy features one at a time from the spec to the chat context OR have them as separate documents and say things like "we're creating Feature 3A.1" or whatever.
6. Iterate on each feature until you're happy then repeat.
> All in all, I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
I find it interesting how these sort of things are often viewed as a function of technological advancement. I would think that AI development tools would have a marginal effect on wages as opposed to things like interest rates or the ability to raise capital.
Back to the topic at hand however, assuming these tools do get better, it would seemingly greatly increase competition. Assuming these tools get better, a highly skilled team with such tools could prove to be formidable competition to longstanding companies. This would require all companies to up the ante to avoid being outcompeted, requiring even more software to be written.
A company could rest on their laurels, laying off a good portion of their employees, and leaving the rest to maintain the same work, but they run the risk of being disrupted themselves.
Alas, at the job I'm at now my team can't seem to release a rather basic feature, despite everyone being enhanced with AI: nobody seems to understand the code, all the changes seem to break something else, the code's a mess... maybe next year AI will be able to fix this.
The first problem they have gained traction on is programming auto complete, and it is useful.
Generating summaries, pretty marginal benefit (personally I find it useless). Writing emails, quicker just to type "FYI" and press send than instruct the ai. More problems that needed solving will emerge, but it will take time.
This is a bad take to have, because it blinds you to the reality that is happening. LLM's are auto complete for pros, but full on programmers for non-tech folk. Like when GUI's first came out, the pros laughed and balked because of how much more powerful the CLI was. But look were the world is today.
At my non-tech job, I can show you three programs written entirely by LLMs that have allowed us to forgo paid software solutions. There is still a moat, IDE's are not consumer friendly, but that is pretty solvable. It will not be long before one of the big AI houses is doing a direct code to offline desktop app IDE that your grandma could use.
I've been using LLMs as learning tools rather than simply answer generators. LLMs can teach you a lot by guiding your thinking, not replacing it.
It's been valuable to engage with the suggestions and understand how they work—much like using a search engine, but more efficient and interactive.
LLMs have also been helpful in deepening my understanding of math topics. For example, I’ve been wanting to build intuition around linear algebra which for me is a slow process. By asking questions to LLM I find explanations make the underlying concepts more accessible.
For me it's about using these tools to learn more effectively.
> Does anyone feel like the biggest selling point of LLMs so far is basically for programmers? Feels like most of the products that look like could generate revenue are for programmers.
No, you're in a tech bubble. I'm in healthcare, and you'd think that AI note takers and summary generators were the reason LLMs were invented and the lion's share of use. I get a new pitch every day, "this product will save your providers hours every day!" They're great products, and our providers love ours, but it's not saving hours.
There's also a huge push for LLMs to work in search and data-retrieval chatbots. The push there is huge, and now Mistal just released Le Chat Enterprise for that exact market.
LLMs for code are so common because they're really easy to create. It's notepad plus chatGPT. Sure, it's actually VS Code and CoPilot, but you get the idea, it's actually not more complicated than regular chatbots.
So many people benefit from basic things like sorting tables, searching and filtering data etc.
Things were I might just use excel or a small script, they can now use an LLM for it.
And for now, we are still in dire need for more developers and not less. But yes I can imagine that after a golden phase of 5-15 years it will start to go down to the bottom when automaisation and ai got too good / better than the avg joe.
Nonetheless a good news is also that coding LLMs enable researchee too. People who often struggle learning to code.
When a company lays off a chunk of the workforce because the increased productivity due to LLMs means they don't need as many people, how is it an enabler for the laid off people.
What happens when most companies do this?
During the 10s, every dev out there was screaming "everyone should learn to code and get a job coding". During the 20s, many devs are being laid off.
For a field full of self-professed smart and logic people, devs do seem to be making tons of irrational choices.
Are we in need of more devs or in need of more skilled devs? Do we necessarily need more software written? Look at npm, the world is flooding in poorly written software that is a null reference exception away from crashing.
People get laid off when money is expensive. When money is expensive, running companies is harder. Starting a new company is even harder. Without capital, all you can offer is some words, a broken demo of your v1 prototype and some sweet words. You can't start a company with just that when money is expensive.
Right now we have not enough software developers at least based on surveys.
So now LLM helps us with that.
In parallel all the changes due to AI also need more effort for now. That's what I called golden age.
After that, I can imagine fundamental change for us developers.
And at least we're I live, a lot of small companies never got the chance to properly become modern due to the good developers earning very good money somewhere else.
I like to think that AI is to code what digital electronic was to analog electronic: a step backward in term of efficiency and 10 steps forward in term of flexibility.
Some of us will always maintain code, but most will move higher in the stack to focus on products and their real world application.
You'll commonly see new technologies utilized by people that have the ability to make use of that technology for their own gain. Programmers are (for the most part) the only ones that can unlock LLMs to solve very specific personal problems. There are workflow automation tools allowing non-programmers the ability to do workflows but that's only one way to utilize them and it will always be constrained by the already developed integrations and the constraints of the workflow platform.
In regards to jobs and job losses I have no idea how this is going to impact individual salaries over time in different positions, but I honestly doubt its going to do much. Language models are still pretty bad at working with large projects in a clean and effective way. Maybe that will get better, but I think this generational breakthrough of technology is slowing down a lot.
Even if they do get better, they still need direction and validation. Both of which still require some understanding of what is going on (even vibe coding works better with a skilled engineer).
I suspect there is going to be more "programmers" in the world as a result, but most of them will be producing small boutique single webpage tools and designs that are higher quality than "made by my cousin's kid" that a lot of small businesses have now. Companies > ~30 people with software engineers on staff seem to be using it as a performance enhancer rather than a work replacement tool.
There will always be shitty managers and short-sighted executives that are looking to replace their human staff with some tool, and there will be layoffs but I don't think the overall pool of jobs is going to reduce. For the same reason I don't think there is going to be significant pay adjustments but a dramatic increase in the long-tail of cheap projects that don't make much money on their own.
I don't get why making engineers more productive would decrease their salaries. It should be the reverse.
You could argue that it makes the bar lower to be productive so the candidate pool is much greater, but you're arguing the opposite, increasing the barrier to entry.
I'm open to arguments either way and I'm undecided, but you have to have a coherent economic model.
But they're more productive. Your assumption is there is a fixed amount of engineering work to do so you need to hire fewer programmers, which is untrue. Every organization I worked at could have invested a lot more in engineering, be-it infrastructure, analytics, automation, etc.
Even if there were a fixed amount of work to do and we're already near that max amount, salaries still wouldn't necessarily go down. Again, they're more productive. Farming used to be 90% of the workforce in the US in the early 1900s. Now farmers are more productive and they're only 2% of the workforce. Do these farmers today earn a lower salary adjusted for inflation than 100 years ago? Of course not, because they're much more productive now with tools.
Generally wages track productivity. The more productive, the higher the wage.
Another example is bank tellers. With the advent of the ATM, somehow bank teller salaries didn't drop in real terms.
Show me an example of where this played out. Someone was made much more productive through technology and their salary dropped considerably
> Your assumption is there is a fixed amount of engineering work to do so you need to hire fewer programmers, which is untrue. Every organization I worked at could have invested a lot more in engineering, be-it infrastructure, analytics, automation, etc.
True. Problem is investment is a long-term action (cost now, for gains later). Literally every company can benefit from investment. The key question is whether how valuable are the gains over a given time period relatively to the cost you are incurring between now and the moment the gains are actualised.
LLMs wouldn't have helped Meta/Microsoft/Google lay off less people in the last 2 years. In fact, you could argue that they would have helped lay off MORE people as with LLMs you need less people to run the company. Do you think Zuckerberg would have INCREASED expenses (that's what productivity investments are) when their stock was in freefall?
Companies can't afford to spend indefinite amounts of money at any time. If your value has been going down or is going down, increasing your current expenses will get you fired. Big problems now, require solutions now. The vast majority of the tech companies in the world chose to apply a solution now.
Maybe you are right, but a look at the tech world in the last 3 years should be telling you that your decision would have been deeply popular with the people that hold the moneybags. And at the end of the day, those are the people you don't want to anger no matter how smart you believe yourself to be.
In the real world experiment we're living through you're being proven wrong. Tech companies have been laying off engineers continuously for several years now and wages are down.
Layoffs started before the rise in llms and all the tooling around coding using llms. They were never used as a justification. What happened was musk bought Twitter, cut 80% headcount and it was still up which showed you can be leaner and other tech ceos took note. That and the stock crashed as we were post COVID bubble.
I learned to program as a child in the 1960s (thanks Dad!) so I have some biases:
Right now there seem to be two extremely valuable LLM use cases:
1. sidekick/assistant for software developers
2. a tool to let people rapidly explore new knowledge and new ideas; unlike an encyclopedia, being able to ask questions, suggest references and get summaries, etc.
I suspect that the next $$$ valuable use case will be scientific research assistants.
EDIT: I would add that AI in k-12 education will be huge, freeing human teachers to spend more 1 on 1 time with kids while AIs will be patient teaching kids, providing extra time and material as needed.
The most valuable LLM use case right now is allowing people who don't know how to program to get their computer to do what they want it to do.
They might not be aware of this, they don't know how to use an IDE, but the hardest part - the code writing part, is solved.
Every week Rachel in [small company] accounting is manually scanning the same column in the same structured excel documents for amounts that don't align with the master amount for that week. She then creates another excel document to properly structure these findings. Then she fills out a report to submit it.
Rachel is a paragraph prompt away from not ever having to do that again, she just hasn't been given the right nudge yet.
I'm non-FAANG and I'm so much more productive now. I am a fullstack dev, I use them for help with emails to non tech individuals, analyzing small datasets, code review, code examples.....it is wild how much faster I can develop these days. My job is actually more secure because I can do more, and OWN more mission critical software, vs outsourcing it.
People forget that software engineers are already speculated to come in 10x and 100x variants, so the impact that one smart dedicated person could make is almost certainly not the problem and not changed at all by AI.
The fact is you could be one is the most insanely valuable and productive engineers in the planet might only write a few lines of code most days, but you'll be writing them in a programming language, OS, or kernel. Value is created by understanding direction and by theory-building, and LLMs do neither.
I built a genuinely new product by working hard as a single human while all my competitors tried to be really productive with LLMs. I'm sure their metrics are great, but at the end of the day I, a human working with my hands and brain and sharpening my OWN intelligence have created what productivity metrics cannot buy: real innovation
Imagine the problem is picking a path against an unexplored desolate desert wasteland. One guide says that he's the fastest. Runs not walks, at a fork in the way always picks a path within 5 seconds. They promise you that they are the fastest guide out there by a factor of two.
You decide on a second opinion, and find an old wizened guide who says they always walk not run, never picks a path more quickly than 5 minutes, and promises you that no matter what sales pitch the other guide gives they can get you across the desert in half the time and half the risk to your life.
There is some mental overhead switching projects. Meaning even if a developer is more efficient per project he wont get more money (usually less actually) while increasing mental load (more projects, more managers, more requirements, etc).
> Feels like most of the products that look like could generate revenue are for programmers.
Don’t discount scamming and spreading misinformation. There’s a lot of money to be made there, specially in mass manipulation to destroy trust in governments and journalists. LLMs and image generators are a treasure trove. Even if they’re imperfect, the overwhelmingly majority of people can’t even distinguish a real image from a blatantly false one, let alone biased text.
In my 30 years of software development, maybe 5 of them were in places were getting people to provide a formal spec was ever an option.
It's also irrelevant if LLM's can follow them - the way I use Claude Code is to have it get things roughly working, supply test cases showing where it fails, then review and clean up the code or go additional rounds with more test cases.
That's not much different to how I work with more junior engineers, who are slower and not all that much less error-prone, though the errors are different in character.
If you can't improve coding speed with LLM's, maybe your style of working just isn't amenable to it, or maybe you don't know the tooling well enough - for me it's sped things up significantly.
The fact that getting a formal spec is impossible is precisely why you need to hire a developer with a big salary and generous benefits.
The formal spec lives only in the developer's head. It's the only way.
Does an LLM coding agent provide any value here?
Hardly. It's just an excuse for the developer to waste time futzing around "coding" when what they're really paid to do is cram that ineffable but very much important formal spec into their heads.
It works just fine to use an LLM coding agent in cases like this, but you need to be aware of what you're actually trying to do with them and be specific instead of assuming they'll magic up the spec from thin air.
I don't know. The other day I wanted to display an Active Directory object to the user. The dict had around 20 keys like "distinguishedname" and the "createdat" with timestamps like 144483738. I wanted friendly display names in a sensible order and have binary values converted to human readable values.
Very easy to do, sure, but the LLM did this in one minute, recognized the context and correctly converted binary values where as this would have taken me maybe 30 minutes of looking up standards and docs and typing in friendly key names.
I also told it to create five color themes and apply them to the CSS. It worked on the first attempt and it looks good, much better than what I could have had produced by thinking of themes, picking colors and copying RGB codes back and forth. Also I'm not fluent in CSS.
Though I wasn't paid for this, it's a hobby project, which I wouldn't have started in the first place without an LLM performing the boring tedious tasks.
For me it's mainly adding quick and dirty hooks to Wordpress websites from berating marketing c-suits for websites that are gonna disappear or never visited anymore in less than a few months.
For that, whatever Claude spits out is more than enough. I'm reasonably confident I'm not going to write much better code in the less-than-30-minutes I'm allowed to spend to fix whatever issue comes up.
It's very marmite. I used to hate it when it was vscode's crappy copilot. Now with Cursor and Windsurf, after some onboarding, I find it indispensable. I have used AI for coding for 3 separate roles:
- freelancer
- CTO
- employee
And in all 3 cases, AI has increased my productivity, and I could ship things even when I'm really sleepy or if I have very little time between things, I can send a prompt to an agent and then review things, and then when I have more time, I can clean up some of the mess.
Now my stance is really at "Whoever doesn't take advantage of it is NGMI"
You're specifically very wrong at "LLM's cannot do: following a formal spec in a particular problem domain". It does take skill to ensure that they will, though, for sure.
I prefer just paying for metered use on every request. I hope monthly fees don’t carry over from the last era of tech. It’s fine to charge consumers $10 per month. But once it’s over $50 let’s not pretend you are hoping I under utilize the service, and you want me to think I’m over utilizing it. These premium subscriptions are too much for me to pretend that math doesn’t exist.
Sort of, but in a good way, if I’ve spent $15 on a problem and it’s not solved, it reminds me to stop wasting tokens and think of a better strategy. On net it makes me use less tokens, but more for efficiency. I mostly love that I don’t need to periodically do math on a subscription to see if I’m getting a good deal this month.
Yes, and thats why phone contracts migrated from "$0.0X per minute" to "$X for up to 500 minutes", and finally "$X for unlimited calls".
When the service you provide has near zero marginal cost, you'd prefer the customer use it as much as possible, because then it'll provide more value to them and they'll be prepared to pay more.
Back when I used dial-up, I experienced a lot of stress when I was connected. I felt I had to be as effective as possible, because we had to pay for every minute spent.
When I switched to DSL the stress went away, and I found myself using internet in different ways than before, because I could explore freely without time pressure.
I think this applies to Claude as well. I will probably feel more free to experiment if I don't have to worry about costs. I might do things I would never think of if I'm only focused on using it as little as possible to save money.
My first use of the internet was dial-up e-mail only exchange via UUCP to a local BBS that exchanged mail every 6 hours (might have been 4), and so to be as effective as possible, I'd prepare all my e-mails including mails to the e-mail<->web gateway at CERN so I could exchange a big batch right before the time slot. Often their exchange took long enough that if I sent the messages to the CERN bot first, I'd get the response included when I downloaded the replies after they'd exchanged with their upstream. Then I had a 6 hour window to figure out what to include in the next batch...
100% with you that how you access something can add constraints and stress - in my case there while we paid per minute, the big factor was the time windows. To maximise utility you wanted to include something useful in as many of the exchanges as possible.
With Claude Code as it is now, I often clear context more often than ideal because it will drive up cost. I could probably add a lot more details to CLAUDE.md in my repos, but it'll drive up tokens as well.
Some of it I'll still do because it affects speed as well, but it'll be nice not to have to pay attention to it.
It's great that there's a choice, but for me the Max plan is likely to save me money already, and I suspect my usage would increase significantly if the top-up (I have intentionally not set it to auto-top-up) didn't regularly remind me not to go nuts.
The problem is that this is $100/mo with limits. At work I use Cursor, which is pretty good (especially tab completion), and at home I use Copilot in vscode insiders build, which is catching up to Cursor IMO.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.
I don’t know why people expect unlimited usage for limited cost. Copilot hasn’t been good for a long time. They had the first mover advantage but they were too slow to improve the product. It’s still not caught up to cursor or windsurf. Cline leaves it so far in the dust it’s like a decade behind in AI years. So you get what you pay for.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
I started using Claude code once it became a fixed price with my Claude max subscription. And it’s taken a little getting used to vs Cline, but I think it’s closer to Cline in performance rather than cursor (Cline being my personal gold standard). $100 is something most people on this forum could make back in 1 day of work.
$100 per month for the value is nothing and for what it’s worth I have tried to hit the usage limit and the only thing that got me close was using their deep research feature. I’ve maxed out Claude code without hitting limits.
Sonnet in Copilot is crippled, Copilot agent mode is also very basic and failed every time I tried it. It would have been amazing 2 years ago, but now it's very meh.
GitHub is losing money on the subs, but they are definitely trying to reduce the bleed. One way to do that is to cut corners with LLM usage, by not sending as much context, trimming the context window, capping output token limits, these are all things Cursor also does btw, hence why Cline, with almost the same tech (in some ways its even inferior tech) achieves better results. I have hit $20 in API usage within a single day with Cline, Cursor lets you have "unlimited" usage for $20 for a month. So its optimised for saving costs, not for giving you the best experience. At $10 per month for Copilot, they need to save costs even more. So you get a bad experience, you think its the AI that is not capable, but the problem is with the companies burning VC money to corner the market, setting unrealistic expectations on pricing, etc.
It was just obviously worse than using the anthropic website. That was the only explanation for why it was so bad. They could offer it free because it was stupid even if the same version (maybe less resources). Or maybe I was just unlucky but that's what it seemed to me.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
Out of date I think in this fast moving space.
Sonnet has long been the gold-standard, but that position is looking very shaky at the moment; Gemini in particular has been working wonders for me and others when Sonnet has stumbled.
VS Code/Copilot has improved massively in Cursor's wake, but yes, still some way to go to catch up.
Absolutely though - the value we are getting is incredible.
In my experience, there are areas where Gemini did well but Claude didn't, same for o1 pro or o3, but for 90% of the work, I find Claude way more trustworthy, better at following instructions, not making syntax mistakes, etc. Gemini 2.5 Pro is way better than all their prior models, but I don't get the hype about it being a coding superstar. It's not bad, but Sonnet is still the primary workhorse. Sonnet is more expensive, so if Gemini was at the same level I'd be happy to save the money, but unfortunately, I've tried it with various approaches, played with the temperature, but in the vast majority of cases Claude does a better job.
But basically you get ~300Mn input tokens and ~100Mn output tokens per month with Sonnet on the $100 plan. These are split across 50 sessions you are allowed, each session is 5 hrs starting from the first time you send a message until 5 hrs after the first message. During this time, you get ~6Mn input and ~2Mn output tokens for Sonnet. Claude Code seems to use a mix of Sonnet and Haiku, and Haiku has 2x the limits of Sonnet.
So if you absolutely maxed out your 50 sessions every month, that's $2400 worth of usage if you instead had used the API. So it's a great deal. It's not $100 worth of API credits you're buying, so they don't run out like that. You can exhaust limits for a given session, which is at most a 5 hr wait for your next one, or you can run out of 50 sessions, I don't know how strongly they enforce that limit and I think that limit is BS, but all in all the value for money is great, way better than using the API.
Thanks for the link and explainer. My first experience with Claude Code left mixed feelings because of the pricing. I have Pro subscription, but for Claude Code can only use API mode. So I added 5$ just to check it, and exhausted 4.5$ in the first 8m session. It left me wondering if switching to Max plan will exhaust it at the same rate or not.
Nope, it can be even a dozen (because agentic). Claude usage limits are actually based on token usage, and Claude Code uses a mix of Haiku and Sonnet. So your limits are split among those two models. I gave an estimation of how much usage you can expect in another comment on this thread, but you will find it hard to max out the $100 plan unless you are using it very, very extensively.
I didn’t realize they were tuning cost optimization by switching models contextually. That’s very clever. I bet the whole industry of consumer LLM apps moves that way.
Gemini 2.5 Pro is better at coding than Claude, it’s just not as good at acting agentically, nor does Google have good tooling to support this use case. Given how quickly they’ve come from far behind and their advantage on context size (Claude’s biggest weakness), this could change just as fast, although I’m skeptical they can deliver a good end user dev tool.
Id be careful with stating things like these as fact. I asked Gemini for half an hour to write code that draws a graph the way I want, it never got it right. Then I asked Cladue 3.7 and it got it almost right the first try, to the point I thought its compeltely right, and fixed the bug I discovered right after I pointed it out.
Yup, I have had similar experience too. Not only for coding, but just yesterday, I was asking Gemini to compose an email with a list of attachments, which I had specified as a list of file paths in the prompt, and it wasn't able to count correctly and report in the email text (the text went something like, there are <number_of_attachments> charts attached). Claude 3.7 was able to do that correctly in one go.
Have I got bad news for you.... Microsoft announced imposing limits on "premium" models from next week. You get 300 "free" requests a month. If you use agent, you consume about 3-4 requests per action easily, I estimate to burn through 300 in about 3-5 working days.
Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.
Doesn’t resonate with me because I’ve spent over $1,000 on Claude Code at this point and the return is worth it. The spend feels cheap compared to output.
In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.
So just for work then or personal projects too? For work I can understand but for personal projects I haven't necessarily gotten more success out of AI than my own code, to be honest.
In terms of personal projects, I use my own custom Ruby X11 window manager, and when I moved and got the office space for an extra monitor, Claude Code wrote the basics of the multi-monitor support by itself.
It's notable to me because there are to my knowledge no other Ruby wm's (there's at least one that allows scripting with Ruby, I believe, but not the whole codebase), the X11 bindings are custom (no Xlib or XCB), and there are few great examples that fits into the structure of my wm. Yet it made it work. The code was ugly, and I haven't committed it yet as I want to clean it up (or get Claude to) but my priority was to be able to use the second monitor without spending more than a few hours on it, and starting with no idea how multi-monitor support in X11 worked.
Since then, Claude Code has added Xinerama support to my X11 bindings, and selection support to enable a systray for my pager, and written the systray implementation (which I also didn't have the faintest clue how worked, and so had Claude explain to me before starting).
I use it for work too, but for these personal projects priority has been rough working code over beauty, because I use them every day and rely on the features, and want to spend as little time as possible on them, and so the work has been very different from how I work with Claude for work projects where I'll work in much smaller chunks, polish the result etc.
Taken from 2 recent systems. 90% of my interaction is assurance, debugging, and then having claude operate within the meta context management framework. We work hard to set the path for actual coding - thus code output (even complex or highly integrated) usually ends up being fairly smooth+fast.
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
Ah yea sorry that is an export error... I copied prompts directly out of Claude Code and when I do that it copies all of the ascii/tui parts that wrap the message... I used some random "strip special chars" site to remove those and was lazy about adding actual punctuation back in.
"Ensure all our crons publish to telegraf when they start and finish. Include the cron name and tenant id when applicable. For crons that query batch jobs, only publish and take a lock when there is work to do. look at <snip> as an example. Here is the complete list to migrate. Create a todo list and continue until done. <insert list of 40 file paths>"
I used it yesterday to convert a website from tailwind v1 to v4. Gave it the files (html/scss/js), links to tailwind and it did the job. Needed some back and forth and some manual stuff but overall it was painless.
It is not a challenging technical thing to do. I could have sat there for hours reading the conversion from v1 to v2 to v3 to v4. It is mostly just changing class names. But these changes are hard to do with %s/x/x, so you need to do them manually. One by One. For hundreds of classes. I could have as easily shot myself in the head.
> Could you anonymize and share your last 5-10 prompts?
The prompt was a simple "convert this site from tailwind v1 to v4". I use neovim copilot chat to inject context and load URLs. I have found that prompts have no value, it is either something the LLM can do or not.
i got $100 of credit at the start of the year, and have been using +1$ each month, starting at $2 in january using aider at the time. just switched to claude code this week, since it follows a similar UX. agentic CLI code assist really has been growing in usefulness for me as i get faster at reviewing its output.
i use it for very targeted operations where it saves me several roundtrips to code examples and documentation and stack overflow, not spamming it for every task i need to do, i spend about $1/day of focused feature development, and it feels like it saves me about 50% as many hours as i spend coding while using it.
What do you prefer, between Aider and CC? I use Aider for when I want to vibe code (I just give the LLM a high-level description and then don't check the output, because it's so long), and Cursor when I want to AI code (I tell the AI to do low-level stuff and check every one of the five lines it gives me).
AI coding saves me a lot of time writing high-quality code, as it takes care of the boilerplate and documentation/API lookups, while I still review every line, and vibe coding lets me quickly do small stuff I couldn't do before (e.g. write a whole app in React Native), but gets really brittle after a certain (small) codebase size.
I'm interested to hear whether Claude Code writes less brittle code, or how you use it/what your experience with it is.
I tested Aider a few times, and gave up because at the time it was so bad - it might be time to try it again, and I'll add that my experience with seeing how Claude Code works for me while lots of other people struggle with it suggests to me that my experience with Aider might well be that my style of working just meshes better with Claude Code than Aider.
Claude Code was the first assistant that gelled for me, and I use it daily. It wrote the first pass of multi-monitor support for my window manager. It's written the last several commits of my Ruby X11 bindings, including a working systray example, where it both suggested the whole approach and implemented it, and tested it with me just acting as a clicking monkey (because I haven't set up any tooling to let it interact with the GUI) when it ran test scripts.
I think you just needs to test the two side by side and see what works for you.
I intend to give Aider a go at some point again, as I would love to use an open source tool for this, but ultimately I'll use the one that produces better results for me.
Makes sense, thanks. I've used Claude Code but it goes off on its own too much, whereas Aider is more focused. If you do give Aider another shot, use the architect/editor mode, with Gemini 2.5 Pro and Claude 3.7, respectively. It's produced the best results for me.
The two worst ways of burning API credits I've found with Claude Code are:
1. Getting argumentative/frustrated with the model if it goes off the rails and continuing to try to make something work when the model isn't getting anywhere.
If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail? If it's not making forward progress after a couple of prompts, it's not likely to unless you split up the task and/or provide more details. This is how you burn $10 instead of $0.60 for a task that "should" be simple. It's bad at telling you something is hard.
2. Think about when you either /compact (trims the context but retains important details) or clear the context entirely. E.g. always clear when moving to another task unless they're closely related. Letting it retain a long context is a surefire way of burn through a lot (and it also slows you down a lot, not least because there's a bug that affects some of us - maybe related to TERM settings? no idea - where in some cases it will re-print the entire history to the terminal, so between tasks it's useful to quit and restart)
Also use /init, but also ask it to update CLAUDE.md with lessons learned regularly. It's pretty good at figuring things out, such as how my custom ORM for a very unusual app server I'm working on works, but it's a massive waste of tokens to have it re-read the ORM layer every time instead of updating CLAUDE.md.
> If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail?
This.
I was fighting with Claude for a good chunk of yesterday (usage limits seemed broken so it didn't really time me out) and most of that was getting it to fix one small issue with three test cases. It would fix one test and break the others, round and round we go. After it broke unrelated tests I had to back out all the changes and, by then, I understood the problem well enough so could direct it how to fix it with a little help from Deepseek.
As there are a bunch of other sections of code which suffer from the same problem I can now tell it to "look at the fixed code and do it like that" so, hopefully, it doesn't flail around in the dark as much.
Admittedly, this is fairly complicated code, being an AST to bytecode compiler with a bunch of optimizations thrown in, and part of the problem was a later optimization pass undoing the 'fixes' Claude was applying which took quite a while to figure out.
Now I just assume Claude is being intentionally daft and treat it as such with questions like "why would I possibly want a fix specifically designed to pass one test instead of a general fix for all the cases?" Oh, yeah, that's its new trick, rewriting the code to only pass the failing test and throwing everything else out because, why not?
Whoever is paying for your time should calculate how much time you’d save between the different products. The actual product price comparison isn’t as important as the impact on output quality and time taken. Could be $1000 a month and still pay for itself in a day, if it generated >$1000 extra value.
This might mean the $10/month is the best. Depends entirely on how it works for you.
(Caps obviously impact the total benefit so I agree there.)
Just today I had yet another conversation about how BigCo doesn't give a damn about cost.
Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.
At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.
And it was signed off, and later followed like a scripture.
Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.
I've seen the books as to how much we spend on all the various AI shit. I can guarantee, that at least in our co, that AI is a massive waste of money.
But it doesn't really matter, because the C-level has been consumed by the hype like nothing I've ever seen. It could cost an arm and a leg and they'd still be pushing for it because the bubble is all-consuming, and anyone not touting AI use doesn't get funding from other similarly clueless and sucked-in VCs.
Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.
The real question is whether this is a better value than pay as you go for some people.
I've often run multiple Claude Code sessions in parallel to do different tasks. Burns money like crazy, but if you can handle wrangling them all there's much less sitting and waiting for output.
I'll add on to this: I don't really use agent modes a lot. In an existing codebase, they waste a lot of my time for mixed results. Maybe Claude Code is so much better at this that it enables a different paradigm of AI editing—but I'd need easy, cheap access to try it.
AI Agent should be treated like a human developer. If you bring a new human developer to your codebase and give them a task it will take a lot of time to read and understand the codebase before making proper solution. If you want to use AI agent regularly it makes sense to have some sort of memory of the codebase.
And it seems like community realizes it and invents different solutions. RooCode has task orchestration built in already, there is a claude task-manager that allows splitting and remembering tasks so AI agent can pick it up quicker, there are different solutions with files like memory bank. Windsurf cursor upgraded their .widsurf/rules functionality to allow more solutions like that for instructing AI agents about the codebase/tasks. Some people even write their own scripts that feed every file to LLM and store the summary description in the separate file that AI agent tool can use instead of searching codebase.
I'm eager to see how some of these solutions will become embedded into every AI agent solution. It's one of the missing stones to make AI agents order of magnitude more efficient and productive.
You don't need a max subscription to use Claude Code. By default it uses your API credits, and I guess I'm not a heavy AI user yet (for my hobby projects), but I haven't spent more than $5/month on Claude Code the past few months.
The problem with it is that it uses a 30k~ token system prompt (albeit "cached"), and very quickly the usage goes up to a few million. I can easily spend over $10 a day.
I always imagined that these $10/mo plans are essentially loss leaders and that in the long run, the price should be much higher. I'm not even sure if that $100/mo plan pays for its underlying costs.
I think their free tiers are by definition of loss leaders, but I think you're right, all of there offerings are loss leaders. I know I can get more from my $20 using Claude Pro than I can using their API Workbench. It is such a competitive space that I don't think its unrealistic for these companies to ever be cash positive because all cash they have needs to be spent on competing in this space.
> I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful
I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.
At 10-20$ a month that calculation is trivial to make. At a 100$ I'm honestly not getting that much value out of AI, especially not every month, and especially not compared to cheaper versions.
I think this thinking is flawed. First, it presupposes a linear value/cost relationship. That is not always true - a bag that costs 100x as much is not 100x more useful.
Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.
Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)
I am glad people like you exist, but I don’t think the proposition you suggest makes sense.
Tangential, but I don't want to use LLMs for writing code because it's one of the things I enjoy the most in life, but it's feeling that I'm going to need to have to to get ready for the next years of my career. I've had some experiences with Claude that have seriously impressed me, but it takes away the fun that I've found in my jobs since I was in middle school writing small programs.
Does anyone have advice for maintaining this feeling but also going with the flow and using LLMs to be more productive (since it feels like it'll be required in the next few years at many jobs)? Do I just have to accept that work will become work and I'll have to get my fix through hobby projects?
I faced a related dilemma when I finished my CS degree: to work as a full-stack dev or to work on more foundational technology (and actually use what I learned in my degree). My experience is that the "foundational technology" area is more "research-oriented", which means you get to work on projects where LLM's don't help that much: writing code in languages that have little data in the LLM's training corpus, coming up with performance benchmarking approaches unique for your application, improving a workload's throughput with insights derived from your benchmarking results and your ingenuity, etc. Had I gone down the full-stack path, I think I'd be worried now.
> I don't want to use LLMs for writing code because it's one of the things I enjoy the most in life
I think LLM's are really good for the "drudge work" when you're coding. I always say it's excellent for things where the actual task is easy but the bottleneck is how fast you can type.
As an example I had a project where I was previously extracting all strings in the UI into an object. For a number of reasons I wanted to move away from this but this codebase is well over 50k LOC and I have probably 5k lines of strings. Doing this manually would have been very tedious and would have taken me quite some time so I leveraged AI to help me and managed to refactor all the strings in my app in a little over an hour.
Are you using it for other things? I think you can write code without it but it’s so good for research and stack overflow replacement.
Last night I used it to look through some project in an open source code base in a language I’m not familiar with to get a report on how that project works. I wanted to know what are its capabilities and integrations with these other specialized tools, because the documentation is so limited. It saved me time and didn’t help me write code. Beyond that it’s good for asking really stupid questions about complex topics that you’d get roasted on for stack overflow.
How can you be sure that the report is accurate? Did you verify that the project actually has the capabilities & integrates with the other specialized tools? I've seen many instances where the model either left out important information or came up with totally new stuff that got buried in the rest (mostly true) of the answer.
I think there will always be jobs out there that don't demand you write code with an LLM, just the same that most jobs don't demand you use vim or emacs or LSP-based autocomplete as part of your workflow.
You don't have to go with the flow. I took a step back from AI tech because a lot of startups in that field come with extra cultural baggage that doesn't sit well with me.
Do you use compilers? Linker loaders? Web bundlers? Linters and formatters? Code gen for or from schema? Image editors? Memory safe or garbage collected languages?
Then you already use levers to build code.
LLMs are a new kind of tool. They’re weird and probabilistic and only sometimes useful. We don’t yet know quite how and when to use them. But they seem like one more lever for us to wield.
I think the probabilistic nature is a huge divider. It requires a complete different way of working and it's understandable that people have trouble switching from one to the other (easier for experienced devs, in my experience, but still makes you switch to code reviewer mode too often).
Treat them as resources for remembering/exploring code libraries and documentation. For example, I needed to import some JSON files as structs into Unreal Engine. Gemini helped me to quickly identify the classes UE has for working with JSON.
> Does anyone have advice for maintaining this feeling but also going with the flow and using LLMs to be more productive
Coding with LLMs brought me so much more joy coding. Not always, but it is getting better. Sometimes is quite frustrating, but when you have some good idea, explain it well and get the model to generate the code the way you would code or even better and you can use it to build new things faster, that's magical. Many devs are having this experience, some earlier, some now, some later. But for sure I would not say that using LLMs to code made it less enjoyable.
I'm curious whether anyone's actually using Claude code successfully. I tried it on release and found it negative value for tasks other than spinning up generic web projects. For existing codebases of even a moderate size, it burns through cash to write code that is always slightly wrong and requires more tuning than writing it myself.
Absolutely stellar for 0-to-1-oriented frontend-related tasks, less so but still quite useful for isolated features in backends. For larger changes or smaller changes in large/more interconnected codebases, refactors, test-run-fix-loops, and similar, it has mostly provided negative value for me unfortunately. I keep wondering if it's a me problem. It would probably do much better if I wrote very lengthy prompts to micromanage little details, but I've found that to be a surprisingly draining activity, so I prefer to give it a shot with a more generic prompt and either let it run or give up, depending on which direction it takes.
You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.
They also have todos built in which make the above even more powerful.
The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.
Is that final number really that crazy? With a well defined goal, you can put out 5-8K per day by writing code the old fashioned way. Also would love to see the code, since in my experience (I use Cursor as a daily driver), AI bloats code by 50% or more with unnecessary comments and whitespace especially when making full classes/files.
> I spend a lot of time setting Claude code up for success.
Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!
You'll never see the code. They will just say how amazingly awesome it is, how it will fundamentally alter how coding is done, etc... and then nothing. Then if you look into who posts it, they work in some AI related startup and aren't even a coder.
Not open source but depending on certain context i can show whoever. im not hard to find.
Ive done just about everything across the full & distributed stack. So I'm down to jam on my code/systems and how I instruct & rely on (confidently) AI to help build them.
5k likes of code a day is 10 lines of code a minute solidly for 8 hours straight. Whatever way you cut that with white space, bracket alignment, that’s a pretty serious amount of code to chunk out.
If I am writing Go, it is easy to generate enough if/else and error checks. When working in java, basic code can bloat to big LoC over several hours(first draft, which is obviously cleaned up later before going to PR). React and other FE frameworks also tend to require huge LoC count(mostly boilerplate and auto completed rather than thoughtfully planned and written). It is not that serious amount as you may think.
Nitpicking like this must be fair, if you look at typical AI code - styles, extra newlines, comments, tests/fixtures, etc. it is the same. And again LoC isn't a good measurement in the first place.
Not all my 5k lines are hand-written or even more than a character; a line can be a closing bracket etc. which autocomplete has handled for the last 20 years. It's definitely an achievement, which is why it's important to get clarity when folks claim to reach peak developer productivity with some new tools. To quote the curl devs, "AI slop" isn't worth nearly the same as thoughtful code right now.
I'd be really interested in seeing the source for this, if it's an open-source project, along with the prompts and some examples. Or other source/prompt examples you know of.
A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.
For context, this is aider tracking aider's code written by an LLM. Of course there's still a human in the loop, but the stats look really cool. It's the first time I've seen such a product work on itself and tracking the results.
Can you share more about what you mean by a meta context/tasking management system? I’m always curious when I see people who have happily spent large amounts on api tokens.
The framework is basically the instructions and my general guidance for updating and ensuring the details of critical information get injected into context.
some of those prompts I commented here: https://news.ycombinator.com/item?id=43932858
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
So i use roo and you have the architecture mode draft out in as much detail as you want, plans, tech stack choices, todos, etc. Switch to orchestration mode to execute the plan, including verifying things are done correctly. It sub tasks out the todos. Tell it to not bother you unless it has a question. Cone back in thirty and see how it’s doing. You can have it commit to a branch per task if you want. Etc etc.
I use it, on a large Clojure/ClojureScript application. And it's good.
The interactions and results are roughly in line with what I'd expect from a junior intern. E.g. don't expect miracles, the answers will sometimes be wrong, the solutions will be naive, and you have to describe what you need done in detail.
The great thing about Claude code is that (as opposed to most other tools) you can start it in a large code base and it will be able to find its way, without me manually "attaching files to context". This is very important, and overlooked in competing solutions.
I tried using aider and plandex, and none of them worked as well. After lots of fiddling I could get mediocre results. Claude Code just works, I can start it up and start DOING THINGS.
It does best with simple repetitive tasks: add another command line option similar to others, add an API interface to functions similar to other examples, etc.
In other words, I'd give it a serious thumbs up: I'd rather work with this than a junior intern, and I have hope for improvement in models in the future.
Here's a very small piece of I code I generated quickly (i.e. <5 min) for a small task (I generated some data and wanted to check the best way to compress it):
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
> I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
Where is the breakpoint here? What number of lines of code or tokens in a codebase when it becomes not worth it?
30 to 40ish in my experience. Current state of the art seems to lack thinking well about programming tasks with a layer of abstraction or zooming out a little bit in terms of what might be required.
I feel like as a programmer I have a meta-design in my head of how something should work, and the code itself is a snapshot of that, and the models currently struggle with this big picture view, and that becomes apparent as they make changes. Entirely willing to believe that Just Add Moar Parameters could fix that (but also entirely willing to believe that there's some kind of current technical dead-end there)
Claude Code is the first AI coding tool that actually worked for me on a small established Laravel codebase in production. It builds full stack features for me requiring only minor tweaks and guidance (and starting all over with new prompts). However, after a while I switched to Cursor Agent just because the IDE integration makes the workflow a little more convenient (especially the ability to roll back to previous checkpoints).
Just to throw my experience in, it's been _wildly_ effective.
Example;
I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.
This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.
This results in a file that claude uses like we do a readme.
Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.
$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)
Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.
Another example:
Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.
The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.
Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.
On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.
If it's burning through cash, you're not being focused enough with it.
If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.
From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)
If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.
So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.
Been on this about a week at the $100/mo mark. I’m not hitting quota limits (I’d swap to the $200/mo in a heartbeat if I were), using Claude Code on multiple tasks simultaneously without abandon. Prior to the flat plan I was spending nearly $1k/mo on tokens. That figure was justifiable but painful. Paying a tenth of it is lovely.
To rescue a flailing project that I took over when a senior hire ghosted a customer in the middle of a project, I got the 200$ Pro package from OpenAI (which is much less usable than Claude for our purposes; there were other benefits related to my client's relationship w/ OpenAI)
In the end, I was able to rescue the code part, rebuilding a 3 month long 10 person project in 2 weeks, with another 2 weeks to implement a follow-up series of requirements. The sheer amount of discussion and code creation would have been impossible without AI, and I used the full limits I was afforded.
So to answer your question, I got my money's worth in that specific use case. That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
* Those who work in enterprise know intuitively what happened next.
> That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
The hardest part about enterprise backend development is understanding the requirements. "Understanding" is not about reading comprehension, and "requirements" are not the written requirements somebody gives you. It's about finding out what requirements are undocumented and which parts of the requirements document is misinformation. LLMs would just dutifully try to implement the written requirements with misinformation and missing edge cases, not the actual requirements.
You dropped off the "non" part of that. It's the non-Unicorn software companies easily paying $120k for a seasoned software developer in the US.
Also, I noticed where our sources diverged. I was looking at household income. My bad.
> which is already way above even the EU for dev salaries
Maybe they're underpaid.
Either way, I was responding to the idea that only a FAANG salary would cost an employer $20k/mo. For US software developer jobs, it can easily hit that without being in FAANG-tier or unicorn startup level companies. Tons of mid-sized low-key software companies you've never heard of pay $120k+ for software devs in the US.
The median software developer in Texas makes >$130k/yr. Think that's all just Facebook and Apple and silicon valley VC funded startup software devs? Similar story in Ohio, is that a place loaded with unicorn software startups? Those median salaries in those markets probably cost their employer around $20k/mo.
If you get an employer match on 401k/HSA, the employer pays full healthcare premium, employer sponsored life insurance benefits, unemployment insurance, employer covered disability, payroll taxes, and all the other software costs, it wouldn't even take $200k in salary to cost $20k/mo. Someone could be making like $150k and still cost the company that much.
Sure, but ~$150k isn't exactly FAANG US salaries for an experienced software dev. That's my point. Lots of people forget how much extra many employers pay for a salaried employee on top of just the take home salary. Labor is expensive in the US.
I imagine a lot of people saw $20k/mo and thought the salary clearly had to be $200k+.
I wish these tools like Cursor, Windsurf etc. provide free option for working with open source projects, after all they trained their models via open source code.
As someone that's happily on the Pro plan (I got a deal at $17 per month) I'm a bit confused seeing people pay $100+ per month ... like what benefits are you getting over the cheaper plan?
When coding with Claude I cherry pick code context, examples etc to provide for tasks so I'm curious to hear what other's workflows are like and what benefits you feel you get using Claude Code or the more expensive plans?
I also haven't run into limits for quite some time now.
Agent mode without rails is like a boat without a rudder.
What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.
These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.
I've tried every variation of this very thing. Even managed to build a quick and dirty ticketing system that I could assign to the LLM of my choosing. WITH context. Talking Graph Codebase's diagrams, mappings, tree structure of every possibility, simple documentation, complex documentation, a bunch of OSS to do this very thing automatically etcetcetc.
In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.
Eventually.....your return diminish. And any time you saved is gone.
And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.
Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.
Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.
So, relying on a large context can be tricky. Instead I’ve tried to get to a ER model quickly. And from there build modules that don’t have tight dependencies.
I cancelled my Claude subscription. I was happily using it for months - asking it the odd question or sometimes having longer discussions to talk through an idea.
Then one day I got nagged to upgrade or wait a few hours. I was pretty annoyed, I didn’t regard my usage as high and felt like a squeeze.
I cancelled my pro plan and now happily using Gemini which costs nothing. These AI companies are still finding their feet commercially!
Google is easily in the best position to hold competitive pricing with their LLMS. They can rely on their multi-billion dollar ad business to prop up their AI advancements, compared to OpenAI or Anthropic that only exist with heavy investment from VC.
Google will probably put 2.5 Pro behind a Google One account once it is out of preview, but I don't see a compelling reason they wouldn't keep Gemini incredibly price competitive with Claude or ChatGPT.
I wonder how successful this pricing model ($100-$200 a month with limits) is going to be. It is very hard to justify, when other tooling in the ~$20/month range offers unlimited usage, and comparable quality.
Is any of the ~$20/month with unlimited usage tooling actually profitable though? It goes without saying that if all else is equal then the product sold at a greater loss will be more popular, but that only works until the vendor runs out of money to light on fire.
Tbh, for these types of systems I do not like the rate limiting at all. I might go days without a need, then folowed by a day of very intense usage.
Also, the 'reputation grind' some of these systems set up where you have to climb 'usage Tiers' before being 'allowed' to use more? Just let me pay and use. I can't compare your system to my current provider before weeks of being throttled at unusable rates? This makes potentially switching to you for serious users way harder than it should be. Is that realy the outcome you want? And no, I am not willing to 'talk to sales' for running a quick feasibilty eval.
It is kinda sad that the information about how many tokens are included is not provided - its hard to judge versus pay-as-you-go api usage because of that
the new Claude code “max plan” would last me all of [1] 5mins… I don’t get why people are excited about this. High powered tools aren’t cheap and aren’t for the consumer…
It's pretty simple: that usage in 5 min is probably at least $10 worth of API credits in that time (maybe $100).
A year has 2000 working hours, which is 24000 5-minute intervals. That means the company spending at least $240,000 on the Claude API (conservatively). So they would be better off having $100-200k you do nothing and hiring someone competent for that $240k.
Haha. No I'm not from Israel. Just a Star Trek fan:).
But thanks for the info. My issue with that is that I don't want to give a way my phone number, and I certainly don't want to pay for a service that gives me a phone number.
I am sure this is worth every dime, but my workflow is so used to Cursor now (cursor rules, model choice, tab complete, to be specific), that I can't be bothered to try this out.
If you're using Cursor with Claude it's gonna be pretty much the same thing. Personally I use Claude Code because I hate the Cursor interface but if you like it I don't think you're missing much.
> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.
nah, there are low powered tools and there’s high powered tools. If people want $20/month happy meal toys in business that business will get left behind. Ignore the consumer market, make Bugatti’s instead - https://ghuntley.com/redlining
Does anyone feel like the biggest selling point of LLMs so far is basically for programmers? Feels like most of the products that look like could generate revenue are for programmers.
While you can see them as a productivity enhancing tool, in times of tight budgets, they can be useful to lay off more programmers because a single one is now way more productive than pre-LLM.
I feel that LLMs will increase the barrier to entry for newcomers while also make it easier for companies to layoff more devs as you don't need as many. All in all, I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
Any thoughts on this?
I have a hypothesis for this.
1. Developers are building these tools/applications because it's far faster and easier for them to build and iterate on something that they can use and provide feedback on directly without putting a marketer, designer, process engineer in the loop.
2. The level of 'finish' required to ship these kinds of tools to devs is lower. If you're shipping an early beta of something like 'Cursor for SEO Managers' the product would need to be much more user friendly. Look at all the hacking people are doing to make MCP servers and get them to work with Cursor. Non-technical folks aren't going to make that work.
So then, once there is a convergence on 'how' to build this kind of stuff for devs. There will be a huge amount of work to go and smooth out the UX and spread equivalents out across other industries. Claude releasing remote MCPs as 'integrations' in their web ui is the first step of this IMO.
When this wave crashes across the broader SaaS/FAANG world I could imagine more demand for devs again, but you're unlikely going to ever see anything like the early 2020s ever again.
Shift feels real. LLMs don't replace devs, but they do compress the value curve. The top 10% get even more leverage, and the bottom 50% become harder to justify.
What worries me isn't layoffs but that entry-level roles become rare, and juniors stop building real intuition because the LLM handles all the hard thinking.
You get surface-level productivity but long-term skill rot.
> juniors stop building real intuition because the LLM handles all the hard thinking. You get surface-level productivity but long-term skill rot.
This was a real problem pre-LLM anyway. A popular article from 2012, How Developers Stop Learning[0], coined the term "expert beginner" for developers who displayed moderate competency at typical workflows, e.g. getting a feature to work, without a deeper understanding of lower levels, or a wider high-level view.
Ultimately most developers don't care, they want to collect a paycheck and go home. LLMs don't change this; the dev who randomly adds StackOverflow snippets to "fix" a crash without understanding the root cause was never going to gain a deeper understanding, the same way the dev who blindly copy&pastes from an LLM won't either.
[0] https://daedtech.com/how-developers-stop-learning-rise-of-th...
> Ultimately most developers don't care, they want to collect a paycheck and go home. LLMs don't change this; the dev who randomly adds StackOverflow snippets to "fix" a crash without understanding the root cause was never going to gain a deeper understanding, the same way the dev who blindly copy&pastes from an LLM won't either.
I read this appraisal of what "most devs" want/care about on HN frequently. Is there actually any evidence to back this up? e.g. broad surveys where most devs say they're just in it for the paycheck and don't care about the quality of their work?
To argue against myself: modern commercial software is largely a dumpster fire, so there could well be truth to the idea!
> I read this appraisal of what "most devs" want/care about on HN frequently. Is there actually any evidence to back this up? e.g. broad surveys where most devs say they're just in it for the paycheck and don't care about the quality of their work?
https://en.wikipedia.org/wiki/Sturgeon%27s_law
Almost every field I've ever seen is like that. Most people don't know what they're doing and hate their jobs in every field. We managed to make even the conceptually most fulfilling jobs awful (teaching, medicine, etc).
I think everything will shift more towards winner takes all.
Complex technology --> Moat --> Barrier to entry --> regulatory capture --> Monopoly == Winner take all --> capital consolidation
A tale as old as time. It's a shame we can't seem to remember this lesson repeating itself over and over and over again every 20-30-50 years. Probably because the winners keep throwing billions at capitalist supply-side propaganda.
You could say the same sort of thing about compilers, or higher-level languages versus lower-level languages.
That's not to say that you're wrong. Most people who use those things don't have a very good idea of what's going on in the next layer down. But it's not new.
I see worrying trends in my office.
Developers (often juniors) use LLM code without taking time to verify it. This leads to bugs and they can't fix it because they don't understand the code. Some senior developers also trust the tool to generate a function, and don't take the time to review it and catch the edge cases that the tool missed.
They rely on ChatGPT to answer their questions instead of taking time to read the documentation or a simple web search to see discussions on stack overflow or blogs about the subject. This may give results in the short term, but they don't actually learn to solve problems themselves. I am afraid that this will have huge negative effects on their career if the tools improve significantly.
Learning how to solve problems is an important skill. They also lose access to the deeper knowledge that enable you to see connections, complexities and flows that the current generation of tools are unable to do. By reading the documentation, blogs or discussions you are often exposed to a wider view of the subject than the laser focused answer of ChatGPT
There will be less room for "vibe coders" in the future, as these tools increasingly solve the simple things without requiring as much management. Until we reach AGI (I doubt it will happen within the next 10 years) the tools will require experienced developers to guide them for the more complex issues. Older experienced developers, and younger developers who have learned how to solve problems and have deep knowledge, will be in demand.
> They rely on ChatGPT to answer their questions instead of taking time to read the documentation or a simple web search.
Documentation is not written with answers in mind. Every little project wants me to be an expert in their solution. They want to share with me the theory behind their decisions. I need an answer now.
Web search no longer provides useful information within the first few results. Instead, I get content farms who are worse than recipe pages - explaining why someone would want this information, but never providing it.
A junior isn’t going to learn from information that starts from the beginning (“if you want to make an apple pie from scratch, you must first invent the universe.”) 99.999% of them need a solution they can tweak as needed so they can begin to understand the thing.
LLMs are good at processing and restructuring information so I can ask for things the way I prefer to receive them.
Ultimately, the problem is actually all about verification.
> Documentation is not written with answers in mind. Every little project wants me to be an expert in their solution. They want to share with me the theory behind their decisions. I need an answer now.
I have an answer now, because I read the documentation last week.
This is kind of dismissive.
As a real example, I needed to change my editor config last month. I do this about once every 5 years. I really didn’t want to become an expert in the config system again, so I tried LLM.
Sad to report, it told me where to look but all of the exact details were wrong. Maybe someday soon, though.
It can be dismissive but also true.
I used to make fun of (or deride) all the "RTFM" people when I was a junior too. Why can't you just tell me how to do whatever thing I'm trying to figure out? Or point me in the right direction instead of just saying "its in the docs lol"?
Sometime in the last few years I've started doing more individual stuff, I've started reading documentation before running npm i. And honestly? All the "RTFM" people were 100% right.
Nobody here is writing code that's going to be used on a patient on the surgical table right now. You have time to read the docs and you'll be better if you do.
I'm also a hypocrite because I will often point an LLM at the root of a set of API docs and ask how to do a thing. But that's the next best thing to actually reading it yourself, I think.
I'm in total agreement, TM does wonders. Even if you don't remember all of it you get a gist of what's gong on and can find things (or read them) faster.
In Claude I put in a default prompt[1] that helps me gain context when I do resort to asking the LLM for a specific question.
[1] Your role is to provide technical advice in developing a Java application. Keep answers concise and note where there are options and where you are unsure on what direction that should be taken. Please cite any sources of information to help me deep dive on any topics that need my own analysis.
Ah yes, LLM is very good at giving me information from documentation that was out of date 15 years ago instead of using the documentation from 2025.
Most LLMs, especially the paid tiers, will fetch updated information. This was a valid complaint perhaps 8-12 months ago.
Mostly made up information in my experience.
it's been enormously useful for my Qt3 work though, it really understands it well.
Same could be said for every language abstraction or systems layer change. When we stopped programming kernel modules and actually found a workable interface it opened the door to so many more developers. I'm sure at the time there was skepticism because people didn't understand the internals of the kernel. That's not the point. The point is to raise the level of abstraction to open the door, increase productivity and focus on new problems.
When you see 30-50 years of change you realise this was inevitable and in every generation there's new engineers entering with limited understanding of the layers beneath. Even the code produced. Do I understand the lexers and the compilers that turn my code in to machine code or instruction sets? Heck no. Doesn't mean I shouldn't use the tools available to me now.
No, but you can understand them if given time. And you can rely on them to be some degree of reliable approaching 100% (and when they fail it will likely be in a consistent way you can understand with sufficient time, and likely fix).
LLMs don’t have these properties. Randomness makes for a poor abstraction layer. We invent tools because humans suffer from this issue too.
> it opened the door to so many more developers. [...] That's not the point. The point is to raise the level of abstraction to open the door, increase productivity and focus on new problems.
There are diminishing returns. At some point, quoting Cool Hand Luke, some men you just can't (r|)teach.
> Developers (often juniors) use LLM code without taking time to verify it. This leads to bugs and they can't fix it because they don't understand the code
Well... is this something new? Previously the trend was to copy and paste Stackoverflow answers, without understanding what it did. Perhaps with LLM code it's an incremental change but the concept is fairly familiar.
Aren't the insufficiencies of the LLMs a temporary condition?
And as with any automation, there will be a select few who will understand it's inner workings, and a vast majority that will enjoy/suffer the benefits.
So the scope of answers are single function or single class ? I have people nearby that are attempting generating whole projects, I really wonder how they will ensure anything about it beside the happy paths. Or maybe they plan to have an army of agents fuzzing and creating hotfixes 24/7 ..
> Or maybe they plan to have an army of agents fuzzing and creating hotfixes 24/7
There are absolutely people who plan to do exactly this. Use AI to create a half-baked, AI-led solution, and continue to use AI to tweak it. For people with sufficient capital it might actually work out halfway decent.
I've had success with greenfield AI generation but only in a very specific manner:
¹ https://www.totaltypescript.com/cursor-rules-for-better-ai-d...> All in all, I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
I find it interesting how these sort of things are often viewed as a function of technological advancement. I would think that AI development tools would have a marginal effect on wages as opposed to things like interest rates or the ability to raise capital.
Back to the topic at hand however, assuming these tools do get better, it would seemingly greatly increase competition. Assuming these tools get better, a highly skilled team with such tools could prove to be formidable competition to longstanding companies. This would require all companies to up the ante to avoid being outcompeted, requiring even more software to be written.
A company could rest on their laurels, laying off a good portion of their employees, and leaving the rest to maintain the same work, but they run the risk of being disrupted themselves.
Alas, at the job I'm at now my team can't seem to release a rather basic feature, despite everyone being enhanced with AI: nobody seems to understand the code, all the changes seem to break something else, the code's a mess... maybe next year AI will be able to fix this.
LLMs are a solution in search of a problem.
The first problem they have gained traction on is programming auto complete, and it is useful.
Generating summaries, pretty marginal benefit (personally I find it useless). Writing emails, quicker just to type "FYI" and press send than instruct the ai. More problems that needed solving will emerge, but it will take time.
This is a bad take to have, because it blinds you to the reality that is happening. LLM's are auto complete for pros, but full on programmers for non-tech folk. Like when GUI's first came out, the pros laughed and balked because of how much more powerful the CLI was. But look were the world is today.
At my non-tech job, I can show you three programs written entirely by LLMs that have allowed us to forgo paid software solutions. There is still a moat, IDE's are not consumer friendly, but that is pretty solvable. It will not be long before one of the big AI houses is doing a direct code to offline desktop app IDE that your grandma could use.
Deep research has saved me weeks worth of man hours in the last couple of months…
Out of curiosity, which vendor? The deep research is somewhat new to me but I am open minded.
OpenAI on o3 and Gemini 2.5. Like the user below I use multiple providers.
I use two or three at a time and then have another LLM merge and synthesize the output
I've been using LLMs as learning tools rather than simply answer generators. LLMs can teach you a lot by guiding your thinking, not replacing it.
It's been valuable to engage with the suggestions and understand how they work—much like using a search engine, but more efficient and interactive.
LLMs have also been helpful in deepening my understanding of math topics. For example, I’ve been wanting to build intuition around linear algebra which for me is a slow process. By asking questions to LLM I find explanations make the underlying concepts more accessible.
For me it's about using these tools to learn more effectively.
> Does anyone feel like the biggest selling point of LLMs so far is basically for programmers? Feels like most of the products that look like could generate revenue are for programmers.
No, you're in a tech bubble. I'm in healthcare, and you'd think that AI note takers and summary generators were the reason LLMs were invented and the lion's share of use. I get a new pitch every day, "this product will save your providers hours every day!" They're great products, and our providers love ours, but it's not saving hours.
There's also a huge push for LLMs to work in search and data-retrieval chatbots. The push there is huge, and now Mistal just released Le Chat Enterprise for that exact market.
LLMs for code are so common because they're really easy to create. It's notepad plus chatGPT. Sure, it's actually VS Code and CoPilot, but you get the idea, it's actually not more complicated than regular chatbots.
I think it's an enabler for everyone.
So many people benefit from basic things like sorting tables, searching and filtering data etc.
Things were I might just use excel or a small script, they can now use an LLM for it.
And for now, we are still in dire need for more developers and not less. But yes I can imagine that after a golden phase of 5-15 years it will start to go down to the bottom when automaisation and ai got too good / better than the avg joe.
Nonetheless a good news is also that coding LLMs enable researchee too. People who often struggle learning to code.
When a company lays off a chunk of the workforce because the increased productivity due to LLMs means they don't need as many people, how is it an enabler for the laid off people.
What happens when most companies do this?
During the 10s, every dev out there was screaming "everyone should learn to code and get a job coding". During the 20s, many devs are being laid off.
For a field full of self-professed smart and logic people, devs do seem to be making tons of irrational choices.
Are we in need of more devs or in need of more skilled devs? Do we necessarily need more software written? Look at npm, the world is flooding in poorly written software that is a null reference exception away from crashing.
> What happens when most companies do this?
It also means it becomes easier to start new company and solve a problem for people.
People get laid off when money is expensive. When money is expensive, running companies is harder. Starting a new company is even harder. Without capital, all you can offer is some words, a broken demo of your v1 prototype and some sweet words. You can't start a company with just that when money is expensive.
Right now we have not enough software developers at least based on surveys.
So now LLM helps us with that.
In parallel all the changes due to AI also need more effort for now. That's what I called golden age.
After that, I can imagine fundamental change for us developers.
And at least we're I live, a lot of small companies never got the chance to properly become modern due to the good developers earning very good money somewhere else.
I like to think that AI is to code what digital electronic was to analog electronic: a step backward in term of efficiency and 10 steps forward in term of flexibility.
Some of us will always maintain code, but most will move higher in the stack to focus on products and their real world application.
You'll commonly see new technologies utilized by people that have the ability to make use of that technology for their own gain. Programmers are (for the most part) the only ones that can unlock LLMs to solve very specific personal problems. There are workflow automation tools allowing non-programmers the ability to do workflows but that's only one way to utilize them and it will always be constrained by the already developed integrations and the constraints of the workflow platform.
In regards to jobs and job losses I have no idea how this is going to impact individual salaries over time in different positions, but I honestly doubt its going to do much. Language models are still pretty bad at working with large projects in a clean and effective way. Maybe that will get better, but I think this generational breakthrough of technology is slowing down a lot.
Even if they do get better, they still need direction and validation. Both of which still require some understanding of what is going on (even vibe coding works better with a skilled engineer).
I suspect there is going to be more "programmers" in the world as a result, but most of them will be producing small boutique single webpage tools and designs that are higher quality than "made by my cousin's kid" that a lot of small businesses have now. Companies > ~30 people with software engineers on staff seem to be using it as a performance enhancer rather than a work replacement tool.
There will always be shitty managers and short-sighted executives that are looking to replace their human staff with some tool, and there will be layoffs but I don't think the overall pool of jobs is going to reduce. For the same reason I don't think there is going to be significant pay adjustments but a dramatic increase in the long-tail of cheap projects that don't make much money on their own.
I don't get why making engineers more productive would decrease their salaries. It should be the reverse.
You could argue that it makes the bar lower to be productive so the candidate pool is much greater, but you're arguing the opposite, increasing the barrier to entry.
I'm open to arguments either way and I'm undecided, but you have to have a coherent economic model.
> I don't get why making engineers more productive would decrease their salaries. It should be the reverse.
You need less engineers to do the same, demand gets lower, offer remains as high.
But they're more productive. Your assumption is there is a fixed amount of engineering work to do so you need to hire fewer programmers, which is untrue. Every organization I worked at could have invested a lot more in engineering, be-it infrastructure, analytics, automation, etc.
Even if there were a fixed amount of work to do and we're already near that max amount, salaries still wouldn't necessarily go down. Again, they're more productive. Farming used to be 90% of the workforce in the US in the early 1900s. Now farmers are more productive and they're only 2% of the workforce. Do these farmers today earn a lower salary adjusted for inflation than 100 years ago? Of course not, because they're much more productive now with tools.
Generally wages track productivity. The more productive, the higher the wage.
Another example is bank tellers. With the advent of the ATM, somehow bank teller salaries didn't drop in real terms.
Show me an example of where this played out. Someone was made much more productive through technology and their salary dropped considerably
> Your assumption is there is a fixed amount of engineering work to do so you need to hire fewer programmers, which is untrue. Every organization I worked at could have invested a lot more in engineering, be-it infrastructure, analytics, automation, etc.
True. Problem is investment is a long-term action (cost now, for gains later). Literally every company can benefit from investment. The key question is whether how valuable are the gains over a given time period relatively to the cost you are incurring between now and the moment the gains are actualised.
LLMs wouldn't have helped Meta/Microsoft/Google lay off less people in the last 2 years. In fact, you could argue that they would have helped lay off MORE people as with LLMs you need less people to run the company. Do you think Zuckerberg would have INCREASED expenses (that's what productivity investments are) when their stock was in freefall?
Companies can't afford to spend indefinite amounts of money at any time. If your value has been going down or is going down, increasing your current expenses will get you fired. Big problems now, require solutions now. The vast majority of the tech companies in the world chose to apply a solution now.
Maybe you are right, but a look at the tech world in the last 3 years should be telling you that your decision would have been deeply popular with the people that hold the moneybags. And at the end of the day, those are the people you don't want to anger no matter how smart you believe yourself to be.
In the real world experiment we're living through you're being proven wrong. Tech companies have been laying off engineers continuously for several years now and wages are down.
Layoffs started before the rise in llms and all the tooling around coding using llms. They were never used as a justification. What happened was musk bought Twitter, cut 80% headcount and it was still up which showed you can be leaner and other tech ceos took note. That and the stock crashed as we were post COVID bubble.
I learned to program as a child in the 1960s (thanks Dad!) so I have some biases:
Right now there seem to be two extremely valuable LLM use cases:
1. sidekick/assistant for software developers
2. a tool to let people rapidly explore new knowledge and new ideas; unlike an encyclopedia, being able to ask questions, suggest references and get summaries, etc.
I suspect that the next $$$ valuable use case will be scientific research assistants.
EDIT: I would add that AI in k-12 education will be huge, freeing human teachers to spend more 1 on 1 time with kids while AIs will be patient teaching kids, providing extra time and material as needed.
The most valuable LLM use case right now is allowing people who don't know how to program to get their computer to do what they want it to do.
They might not be aware of this, they don't know how to use an IDE, but the hardest part - the code writing part, is solved.
Every week Rachel in [small company] accounting is manually scanning the same column in the same structured excel documents for amounts that don't align with the master amount for that week. She then creates another excel document to properly structure these findings. Then she fills out a report to submit it.
Rachel is a paragraph prompt away from not ever having to do that again, she just hasn't been given the right nudge yet.
It is a bit like incandescent light was early selling point of electricity.
Stable odourless on-demand light was in short supply, so it helped to jump-start a new industry and network.
The real range of possible uses is near endless, for tech available today. It is just a coincidence that coding is in short supply today.
> I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
Are you implying that non-FAANG devs aren't able to do more with LLMs?
I'm non-FAANG and I'm so much more productive now. I am a fullstack dev, I use them for help with emails to non tech individuals, analyzing small datasets, code review, code examples.....it is wild how much faster I can develop these days. My job is actually more secure because I can do more, and OWN more mission critical software, vs outsourcing it.
People forget that software engineers are already speculated to come in 10x and 100x variants, so the impact that one smart dedicated person could make is almost certainly not the problem and not changed at all by AI.
The fact is you could be one is the most insanely valuable and productive engineers in the planet might only write a few lines of code most days, but you'll be writing them in a programming language, OS, or kernel. Value is created by understanding direction and by theory-building, and LLMs do neither.
I built a genuinely new product by working hard as a single human while all my competitors tried to be really productive with LLMs. I'm sure their metrics are great, but at the end of the day I, a human working with my hands and brain and sharpening my OWN intelligence have created what productivity metrics cannot buy: real innovation
Imagine the problem is picking a path against an unexplored desolate desert wasteland. One guide says that he's the fastest. Runs not walks, at a fork in the way always picks a path within 5 seconds. They promise you that they are the fastest guide out there by a factor of two.
You decide on a second opinion, and find an old wizened guide who says they always walk not run, never picks a path more quickly than 5 minutes, and promises you that no matter what sales pitch the other guide gives they can get you across the desert in half the time and half the risk to your life.
Both can't be true. Who do you believe and why?
It can backfire though.
There is some mental overhead switching projects. Meaning even if a developer is more efficient per project he wont get more money (usually less actually) while increasing mental load (more projects, more managers, more requirements, etc).
Will be interesting to watch
> Feels like most of the products that look like could generate revenue are for programmers.
Don’t discount scamming and spreading misinformation. There’s a lot of money to be made there, specially in mass manipulation to destroy trust in governments and journalists. LLMs and image generators are a treasure trove. Even if they’re imperfect, the overwhelmingly majority of people can’t even distinguish a real image from a blatantly false one, let alone biased text.
LLM's don't increase programmer productivity. In fact, they actively harm it.
Programmers aren't paid for coding, they're paid for following a formal spec in a particular problem domain. (Something that LLM's can't do at all.)
Improving coding speed is a red herring and a scam.
In my 30 years of software development, maybe 5 of them were in places were getting people to provide a formal spec was ever an option.
It's also irrelevant if LLM's can follow them - the way I use Claude Code is to have it get things roughly working, supply test cases showing where it fails, then review and clean up the code or go additional rounds with more test cases.
That's not much different to how I work with more junior engineers, who are slower and not all that much less error-prone, though the errors are different in character.
If you can't improve coding speed with LLM's, maybe your style of working just isn't amenable to it, or maybe you don't know the tooling well enough - for me it's sped things up significantly.
You don't understand.
The fact that getting a formal spec is impossible is precisely why you need to hire a developer with a big salary and generous benefits.
The formal spec lives only in the developer's head. It's the only way.
Does an LLM coding agent provide any value here?
Hardly. It's just an excuse for the developer to waste time futzing around "coding" when what they're really paid to do is cram that ineffable but very much important formal spec into their heads.
> The formal spec lives only in the developer's head.
You and I have different ideas of what a formal spec is.
https://en.wikipedia.org/wiki/Formal_specification
Nonsense.
It works just fine to use an LLM coding agent in cases like this, but you need to be aware of what you're actually trying to do with them and be specific instead of assuming they'll magic up the spec from thin air.
I don't know. The other day I wanted to display an Active Directory object to the user. The dict had around 20 keys like "distinguishedname" and the "createdat" with timestamps like 144483738. I wanted friendly display names in a sensible order and have binary values converted to human readable values.
Very easy to do, sure, but the LLM did this in one minute, recognized the context and correctly converted binary values where as this would have taken me maybe 30 minutes of looking up standards and docs and typing in friendly key names.
I also told it to create five color themes and apply them to the CSS. It worked on the first attempt and it looks good, much better than what I could have had produced by thinking of themes, picking colors and copying RGB codes back and forth. Also I'm not fluent in CSS.
Though I wasn't paid for this, it's a hobby project, which I wouldn't have started in the first place without an LLM performing the boring tedious tasks.
Yes, these sorts of tasks are where LLM's are exceedingly useful.
But I was talking specifically about coding agents.
(A.k.a. spend four hours micromanaging prompts and contexts to do what can be done in 15 minutes manually.)
Yes, these sorts of tasks (classification and summarizing and generally naming things) are where LLM's are exceedingly useful.
But I was talking specifically about coding agents.
(A.k.a. spend four hours micromanaging prompts and contexts to do what can be done in 15 minutes manually.)
[dead]
It depends on what you consider "coding".
For me it's mainly adding quick and dirty hooks to Wordpress websites from berating marketing c-suits for websites that are gonna disappear or never visited anymore in less than a few months.
For that, whatever Claude spits out is more than enough. I'm reasonably confident I'm not going to write much better code in the less-than-30-minutes I'm allowed to spend to fix whatever issue comes up.
It's very marmite. I used to hate it when it was vscode's crappy copilot. Now with Cursor and Windsurf, after some onboarding, I find it indispensable. I have used AI for coding for 3 separate roles: - freelancer - CTO - employee
And in all 3 cases, AI has increased my productivity, and I could ship things even when I'm really sleepy or if I have very little time between things, I can send a prompt to an agent and then review things, and then when I have more time, I can clean up some of the mess.
Now my stance is really at "Whoever doesn't take advantage of it is NGMI"
You're specifically very wrong at "LLM's cannot do: following a formal spec in a particular problem domain". It does take skill to ensure that they will, though, for sure.
TLDR: Skill issue
[dead]
I prefer just paying for metered use on every request. I hope monthly fees don’t carry over from the last era of tech. It’s fine to charge consumers $10 per month. But once it’s over $50 let’s not pretend you are hoping I under utilize the service, and you want me to think I’m over utilizing it. These premium subscriptions are too much for me to pretend that math doesn’t exist.
Doesn't per-call pricing reduce your usage? When I see the price of a session go to >$3, for a handful of interactions I self-limit my usage.
I'd love to have an all-you-can-eat, but $100 p/m isn't compelling enough compared to copy/paste for $20 p/m via chat.
That's not to say the value doesn't exceed $100, I just don't want to pay it.
Sort of, but in a good way, if I’ve spent $15 on a problem and it’s not solved, it reminds me to stop wasting tokens and think of a better strategy. On net it makes me use less tokens, but more for efficiency. I mostly love that I don’t need to periodically do math on a subscription to see if I’m getting a good deal this month.
> Doesn't per-call pricing reduce your usage?
Yes, and thats why phone contracts migrated from "$0.0X per minute" to "$X for up to 500 minutes", and finally "$X for unlimited calls".
When the service you provide has near zero marginal cost, you'd prefer the customer use it as much as possible, because then it'll provide more value to them and they'll be prepared to pay more.
Back when I used dial-up, I experienced a lot of stress when I was connected. I felt I had to be as effective as possible, because we had to pay for every minute spent.
When I switched to DSL the stress went away, and I found myself using internet in different ways than before, because I could explore freely without time pressure.
I think this applies to Claude as well. I will probably feel more free to experiment if I don't have to worry about costs. I might do things I would never think of if I'm only focused on using it as little as possible to save money.
My first use of the internet was dial-up e-mail only exchange via UUCP to a local BBS that exchanged mail every 6 hours (might have been 4), and so to be as effective as possible, I'd prepare all my e-mails including mails to the e-mail<->web gateway at CERN so I could exchange a big batch right before the time slot. Often their exchange took long enough that if I sent the messages to the CERN bot first, I'd get the response included when I downloaded the replies after they'd exchanged with their upstream. Then I had a 6 hour window to figure out what to include in the next batch...
100% with you that how you access something can add constraints and stress - in my case there while we paid per minute, the big factor was the time windows. To maximise utility you wanted to include something useful in as many of the exchanges as possible.
With Claude Code as it is now, I often clear context more often than ideal because it will drive up cost. I could probably add a lot more details to CLAUDE.md in my repos, but it'll drive up tokens as well.
Some of it I'll still do because it affects speed as well, but it'll be nice not to have to pay attention to it.
It's great that there's a choice, but for me the Max plan is likely to save me money already, and I suspect my usage would increase significantly if the top-up (I have intentionally not set it to auto-top-up) didn't regularly remind me not to go nuts.
The problem is that this is $100/mo with limits. At work I use Cursor, which is pretty good (especially tab completion), and at home I use Copilot in vscode insiders build, which is catching up to Cursor IMO.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.
I don’t know why people expect unlimited usage for limited cost. Copilot hasn’t been good for a long time. They had the first mover advantage but they were too slow to improve the product. It’s still not caught up to cursor or windsurf. Cline leaves it so far in the dust it’s like a decade behind in AI years. So you get what you pay for.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
I started using Claude code once it became a fixed price with my Claude max subscription. And it’s taken a little getting used to vs Cline, but I think it’s closer to Cline in performance rather than cursor (Cline being my personal gold standard). $100 is something most people on this forum could make back in 1 day of work.
$100 per month for the value is nothing and for what it’s worth I have tried to hit the usage limit and the only thing that got me close was using their deep research feature. I’ve maxed out Claude code without hitting limits.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
I might be missing something, but you can use Claude 3.7 in Copilot Chat:
https://docs.github.com/en/copilot/using-github-copilot/ai-m...
VS Code with your favorite model in Copilot is rapidly catching up with Cursor, etc. It's not there yet, but the trajectory is good.
(Maybe you meant code completion? But even smaller, local models do pretty well in code completion.)
Sonnet in Copilot is crippled, Copilot agent mode is also very basic and failed every time I tried it. It would have been amazing 2 years ago, but now it's very meh.
GitHub is losing money on the subs, but they are definitely trying to reduce the bleed. One way to do that is to cut corners with LLM usage, by not sending as much context, trimming the context window, capping output token limits, these are all things Cursor also does btw, hence why Cline, with almost the same tech (in some ways its even inferior tech) achieves better results. I have hit $20 in API usage within a single day with Cline, Cursor lets you have "unlimited" usage for $20 for a month. So its optimised for saving costs, not for giving you the best experience. At $10 per month for Copilot, they need to save costs even more. So you get a bad experience, you think its the AI that is not capable, but the problem is with the companies burning VC money to corner the market, setting unrealistic expectations on pricing, etc.
When I tried Claude in copilot it was so obviously crippled as to be useless. I deleted copilot and never went back.
Care to explain why? Isn't the Claude version in Copilot exactly the same as in Claude Code?
It was just obviously worse than using the anthropic website. That was the only explanation for why it was so bad. They could offer it free because it was stupid even if the same version (maybe less resources). Or maybe I was just unlucky but that's what it seemed to me.
I agree with much of your post, but:
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
Out of date I think in this fast moving space.
Sonnet has long been the gold-standard, but that position is looking very shaky at the moment; Gemini in particular has been working wonders for me and others when Sonnet has stumbled.
VS Code/Copilot has improved massively in Cursor's wake, but yes, still some way to go to catch up.
Absolutely though - the value we are getting is incredible.
In my experience, there are areas where Gemini did well but Claude didn't, same for o1 pro or o3, but for 90% of the work, I find Claude way more trustworthy, better at following instructions, not making syntax mistakes, etc. Gemini 2.5 Pro is way better than all their prior models, but I don't get the hype about it being a coding superstar. It's not bad, but Sonnet is still the primary workhorse. Sonnet is more expensive, so if Gemini was at the same level I'd be happy to save the money, but unfortunately, I've tried it with various approaches, played with the temperature, but in the vast majority of cases Claude does a better job.
> $100 is something most people on this forum could make back in 1 day of work.
I expect so. The question is "How many days does the limit last for?"
Maybe they have a per-day limit, maybe it's per-month (I'm not sure), but paying $100/m and hitting the limit in the first day is not economical.
I wrote about this on my blog: https://www.asad.pw/llm-subscriptions-vs-apis-value-for-mone...
But basically you get ~300Mn input tokens and ~100Mn output tokens per month with Sonnet on the $100 plan. These are split across 50 sessions you are allowed, each session is 5 hrs starting from the first time you send a message until 5 hrs after the first message. During this time, you get ~6Mn input and ~2Mn output tokens for Sonnet. Claude Code seems to use a mix of Sonnet and Haiku, and Haiku has 2x the limits of Sonnet.
So if you absolutely maxed out your 50 sessions every month, that's $2400 worth of usage if you instead had used the API. So it's a great deal. It's not $100 worth of API credits you're buying, so they don't run out like that. You can exhaust limits for a given session, which is at most a 5 hr wait for your next one, or you can run out of 50 sessions, I don't know how strongly they enforce that limit and I think that limit is BS, but all in all the value for money is great, way better than using the API.
Thanks for the link and explainer. My first experience with Claude Code left mixed feelings because of the pricing. I have Pro subscription, but for Claude Code can only use API mode. So I added 5$ just to check it, and exhausted 4.5$ in the first 8m session. It left me wondering if switching to Max plan will exhaust it at the same rate or not.
Right into the announcement, later down, they even explain how to handle the limits:
How Rate Limits Work: With the Max plan, your usage limits are shared across both Claude and Claude Code:
Shared rate limits: All activity in both Claude and Claude Code counts against the same usage limits.
Message variations: The number of messages you can send on Claude varies based on message length, conversation length, and file attachments.
Coding usage variations: Expected usage for Claude Code will vary based on project complexity, codebase size, and auto-accept settings.
On the Max plan (5x Pro/$100), average users:
- Send approximately 225 messages with Claude every 5 hours, OR
- Send approximately 50-200 prompts with Claude Code every 5 hours
On the Max plan (20x Pro/$200), average users:
- Send approximately 900 messages with Claude every 5 hours, OR
- Send approximately 200-800 prompts with Claude Code every 5 hours
How many prompts does Claude code send per user prompt? Is it 1:1?
Nope, it can be even a dozen (because agentic). Claude usage limits are actually based on token usage, and Claude Code uses a mix of Haiku and Sonnet. So your limits are split among those two models. I gave an estimation of how much usage you can expect in another comment on this thread, but you will find it hard to max out the $100 plan unless you are using it very, very extensively.
I didn’t realize they were tuning cost optimization by switching models contextually. That’s very clever. I bet the whole industry of consumer LLM apps moves that way.
Exactly, $100 per month is nothing for professional usage. For hobby projects, it is a lot.
From the internet, we got used to get everything for nothing, thus ppl beg for a lower price, even if it doesn't make sense.
It makes perfect sense if the market is cheaper.
Gemini 2.5 Pro is better at coding than Claude, it’s just not as good at acting agentically, nor does Google have good tooling to support this use case. Given how quickly they’ve come from far behind and their advantage on context size (Claude’s biggest weakness), this could change just as fast, although I’m skeptical they can deliver a good end user dev tool.
> Gemini 2.5 Pro is better at coding than Claude
Id be careful with stating things like these as fact. I asked Gemini for half an hour to write code that draws a graph the way I want, it never got it right. Then I asked Cladue 3.7 and it got it almost right the first try, to the point I thought its compeltely right, and fixed the bug I discovered right after I pointed it out.
Yup, I have had similar experience too. Not only for coding, but just yesterday, I was asking Gemini to compose an email with a list of attachments, which I had specified as a list of file paths in the prompt, and it wasn't able to count correctly and report in the email text (the text went something like, there are <number_of_attachments> charts attached). Claude 3.7 was able to do that correctly in one go.
How much do you pay for Gemini 2.5 Pro?
Something like $20/month, first 2 months $10. Depends on the country.
What about Sourcegraph? How do they compare?
Have I got bad news for you.... Microsoft announced imposing limits on "premium" models from next week. You get 300 "free" requests a month. If you use agent, you consume about 3-4 requests per action easily, I estimate to burn through 300 in about 3-5 working days.
Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.
Enforcement of Copilot premium request limits moved to June 4, 2025 https://github.blog/changelog/2025-05-07-enforcement-of-copi...
The default unlimited model is now gpt 4.1 https://github.blog/changelog/2025-05-08-openai-gpt-4-1-is-n...
> I find gpt4o near useless compared to Claude and Gemini in copilot.
It's a hit and miss IMO.
I like it for C#/dotnet but completely useless for the rest of the stuff I do (mostly web frontend).
I'm not sure about my usage but if I hit those premium limits I'm probably going to cancel Copilot.
Doesn’t resonate with me because I’ve spent over $1,000 on Claude Code at this point and the return is worth it. The spend feels cheap compared to output.
In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.
> the return is worth it
I'm curious, what was the return? What did you do with the 1k?
Produce working code faster => ship faster => paid faster? That's the valu-prop right? So, naturally the $JOB will cover the bill.
something like that. Think "paid more" as well
So just for work then or personal projects too? For work I can understand but for personal projects I haven't necessarily gotten more success out of AI than my own code, to be honest.
In terms of personal projects, I use my own custom Ruby X11 window manager, and when I moved and got the office space for an extra monitor, Claude Code wrote the basics of the multi-monitor support by itself.
It's notable to me because there are to my knowledge no other Ruby wm's (there's at least one that allows scripting with Ruby, I believe, but not the whole codebase), the X11 bindings are custom (no Xlib or XCB), and there are few great examples that fits into the structure of my wm. Yet it made it work. The code was ugly, and I haven't committed it yet as I want to clean it up (or get Claude to) but my priority was to be able to use the second monitor without spending more than a few hours on it, and starting with no idea how multi-monitor support in X11 worked.
Since then, Claude Code has added Xinerama support to my X11 bindings, and selection support to enable a systray for my pager, and written the systray implementation (which I also didn't have the faintest clue how worked, and so had Claude explain to me before starting).
I use it for work too, but for these personal projects priority has been rough working code over beauty, because I use them every day and rely on the features, and want to spend as little time as possible on them, and so the work has been very different from how I work with Claude for work projects where I'll work in much smaller chunks, polish the result etc.
so you didn’t spend a penny? :)
Could you anonymize and share your last 5-10 prompts? Just wanna understand how people are using Claude Code.
These aren't that fun but sure.
- https://gist.github.com/backnotprop/ca49f356bdd2ab7bb7a366ef...
- https://gist.github.com/backnotprop/d9f1d9f9b4379d6551ba967c...
- https://gist.github.com/backnotprop/e74b5b0f714e0429750ef6b0...
- https://gist.github.com/backnotprop/91f1a08d9c27698310d63e06...
- https://gist.github.com/backnotprop/7f7cb63aceb7560e51c02a9d...
- https://gist.github.com/backnotprop/94080dde34bfca3dd9c48f14...
- https://gist.github.com/backnotprop/ea3a5c3a31799236115abc76...
Taken from 2 recent systems. 90% of my interaction is assurance, debugging, and then having claude operate within the meta context management framework. We work hard to set the path for actual coding - thus code output (even complex or highly integrated) usually ends up being fairly smooth+fast.
When I "wake" CC up I usually use a prompt like this to preface any complex work: https://gist.github.com/backnotprop/d2e4547fc4546eea071b9b68... (the goal is to get all relevant context in-memory).
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
worth noting that some of the prompts are related to the project context management system i use: (obfuscated business details) https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...
Interesting. Thanks.
Could you explain why there is no punctuation?
Ah yea sorry that is an export error... I copied prompts directly out of Claude Code and when I do that it copies all of the ascii/tui parts that wrap the message... I used some random "strip special chars" site to remove those and was lazy about adding actual punctuation back in.
"Ensure all our crons publish to telegraf when they start and finish. Include the cron name and tenant id when applicable. For crons that query batch jobs, only publish and take a lock when there is work to do. look at <snip> as an example. Here is the complete list to migrate. Create a todo list and continue until done. <insert list of 40 file paths>"
(updated for better example)
The thing I forgot is the command for it to get the next set of files to process. Otherwise it will migrate 30% of them and say "look dad, I'm done!"
I used it yesterday to convert a website from tailwind v1 to v4. Gave it the files (html/scss/js), links to tailwind and it did the job. Needed some back and forth and some manual stuff but overall it was painless.
It is not a challenging technical thing to do. I could have sat there for hours reading the conversion from v1 to v2 to v3 to v4. It is mostly just changing class names. But these changes are hard to do with %s/x/x, so you need to do them manually. One by One. For hundreds of classes. I could have as easily shot myself in the head.
> Could you anonymize and share your last 5-10 prompts?
The prompt was a simple "convert this site from tailwind v1 to v4". I use neovim copilot chat to inject context and load URLs. I have found that prompts have no value, it is either something the LLM can do or not.
hey, I'm open to that possibility. Maybe I'll grab $5 in API credit and give it a shot (for 5 minutes or a week depending on who you ask)
i got $100 of credit at the start of the year, and have been using +1$ each month, starting at $2 in january using aider at the time. just switched to claude code this week, since it follows a similar UX. agentic CLI code assist really has been growing in usefulness for me as i get faster at reviewing its output.
i use it for very targeted operations where it saves me several roundtrips to code examples and documentation and stack overflow, not spamming it for every task i need to do, i spend about $1/day of focused feature development, and it feels like it saves me about 50% as many hours as i spend coding while using it.
What do you prefer, between Aider and CC? I use Aider for when I want to vibe code (I just give the LLM a high-level description and then don't check the output, because it's so long), and Cursor when I want to AI code (I tell the AI to do low-level stuff and check every one of the five lines it gives me).
AI coding saves me a lot of time writing high-quality code, as it takes care of the boilerplate and documentation/API lookups, while I still review every line, and vibe coding lets me quickly do small stuff I couldn't do before (e.g. write a whole app in React Native), but gets really brittle after a certain (small) codebase size.
I'm interested to hear whether Claude Code writes less brittle code, or how you use it/what your experience with it is.
I tested Aider a few times, and gave up because at the time it was so bad - it might be time to try it again, and I'll add that my experience with seeing how Claude Code works for me while lots of other people struggle with it suggests to me that my experience with Aider might well be that my style of working just meshes better with Claude Code than Aider.
Claude Code was the first assistant that gelled for me, and I use it daily. It wrote the first pass of multi-monitor support for my window manager. It's written the last several commits of my Ruby X11 bindings, including a working systray example, where it both suggested the whole approach and implemented it, and tested it with me just acting as a clicking monkey (because I haven't set up any tooling to let it interact with the GUI) when it ran test scripts.
I think you just needs to test the two side by side and see what works for you.
I intend to give Aider a go at some point again, as I would love to use an open source tool for this, but ultimately I'll use the one that produces better results for me.
Makes sense, thanks. I've used Claude Code but it goes off on its own too much, whereas Aider is more focused. If you do give Aider another shot, use the architect/editor mode, with Gemini 2.5 Pro and Claude 3.7, respectively. It's produced the best results for me.
A couple of tips if you're just starting with it:
The two worst ways of burning API credits I've found with Claude Code are:
1. Getting argumentative/frustrated with the model if it goes off the rails and continuing to try to make something work when the model isn't getting anywhere.
If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail? If it's not making forward progress after a couple of prompts, it's not likely to unless you split up the task and/or provide more details. This is how you burn $10 instead of $0.60 for a task that "should" be simple. It's bad at telling you something is hard.
2. Think about when you either /compact (trims the context but retains important details) or clear the context entirely. E.g. always clear when moving to another task unless they're closely related. Letting it retain a long context is a surefire way of burn through a lot (and it also slows you down a lot, not least because there's a bug that affects some of us - maybe related to TERM settings? no idea - where in some cases it will re-print the entire history to the terminal, so between tasks it's useful to quit and restart)
Also use /init, but also ask it to update CLAUDE.md with lessons learned regularly. It's pretty good at figuring things out, such as how my custom ORM for a very unusual app server I'm working on works, but it's a massive waste of tokens to have it re-read the ORM layer every time instead of updating CLAUDE.md.
> If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail?
This.
I was fighting with Claude for a good chunk of yesterday (usage limits seemed broken so it didn't really time me out) and most of that was getting it to fix one small issue with three test cases. It would fix one test and break the others, round and round we go. After it broke unrelated tests I had to back out all the changes and, by then, I understood the problem well enough so could direct it how to fix it with a little help from Deepseek.
As there are a bunch of other sections of code which suffer from the same problem I can now tell it to "look at the fixed code and do it like that" so, hopefully, it doesn't flail around in the dark as much.
Admittedly, this is fairly complicated code, being an AST to bytecode compiler with a bunch of optimizations thrown in, and part of the problem was a later optimization pass undoing the 'fixes' Claude was applying which took quite a while to figure out.
Now I just assume Claude is being intentionally daft and treat it as such with questions like "why would I possibly want a fix specifically designed to pass one test instead of a general fix for all the cases?" Oh, yeah, that's its new trick, rewriting the code to only pass the failing test and throwing everything else out because, why not?
Whoever is paying for your time should calculate how much time you’d save between the different products. The actual product price comparison isn’t as important as the impact on output quality and time taken. Could be $1000 a month and still pay for itself in a day, if it generated >$1000 extra value.
This might mean the $10/month is the best. Depends entirely on how it works for you.
(Caps obviously impact the total benefit so I agree there.)
Just today I had yet another conversation about how BigCo doesn't give a damn about cost.
Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.
At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.
And it was signed off, and later followed like a scripture.
Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.
I've seen the books as to how much we spend on all the various AI shit. I can guarantee, that at least in our co, that AI is a massive waste of money.
But it doesn't really matter, because the C-level has been consumed by the hype like nothing I've ever seen. It could cost an arm and a leg and they'd still be pushing for it because the bubble is all-consuming, and anyone not touting AI use doesn't get funding from other similarly clueless and sucked-in VCs.
C-suites love to pretend they can do and indeed do these calculations.
They don't. They toss a coin.
> The problem is that this is $100/mo with limits
Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.
The real question is whether this is a better value than pay as you go for some people.
> 8 hours a day for 20 days a week
Your vibe coders are on a different dimension than mine.
I've often run multiple Claude Code sessions in parallel to do different tasks. Burns money like crazy, but if you can handle wrangling them all there's much less sitting and waiting for output.
Wow, and I thought the Beatles were exaggerating how long a week was!
https://www.youtube.com/watch?v=kle2xHhRHg4
What's the problem if you aren't selling the service as a loss leader and that 'vibe coder' is paying into their account upfront?
Only reason I can see is you're lacking agregate capacity and are unwilling or unable to build out faster. Is that the case?
I'll add on to this: I don't really use agent modes a lot. In an existing codebase, they waste a lot of my time for mixed results. Maybe Claude Code is so much better at this that it enables a different paradigm of AI editing—but I'd need easy, cheap access to try it.
AI Agent should be treated like a human developer. If you bring a new human developer to your codebase and give them a task it will take a lot of time to read and understand the codebase before making proper solution. If you want to use AI agent regularly it makes sense to have some sort of memory of the codebase.
And it seems like community realizes it and invents different solutions. RooCode has task orchestration built in already, there is a claude task-manager that allows splitting and remembering tasks so AI agent can pick it up quicker, there are different solutions with files like memory bank. Windsurf cursor upgraded their .widsurf/rules functionality to allow more solutions like that for instructing AI agents about the codebase/tasks. Some people even write their own scripts that feed every file to LLM and store the summary description in the separate file that AI agent tool can use instead of searching codebase.
I'm eager to see how some of these solutions will become embedded into every AI agent solution. It's one of the missing stones to make AI agents order of magnitude more efficient and productive.
You don't need a max subscription to use Claude Code. By default it uses your API credits, and I guess I'm not a heavy AI user yet (for my hobby projects), but I haven't spent more than $5/month on Claude Code the past few months.
I burned $30 in Claude Code in just under an hour. I was equally frustrated and impressed. So much so I ended up a $200 MAX subscriber.
The money starts adding up fast as your context fills up since it's resending the whole accumulated context back through the api every time.
They're good about telling you how full your context is, and you can use /compact to shrink it down to the essentials.
But for those of us who aren't Mr. MoneyBags like you all, keeping an eye on context size is key to keeping costs low.
I’ve been wanting to try Claude Code. What makes it such a difference maker compared to existing AI tools?
Can I assume you are still running into rate limits?
The problem with it is that it uses a 30k~ token system prompt (albeit "cached"), and very quickly the usage goes up to a few million. I can easily spend over $10 a day.
I spent $5 in 10 minutes when I tried it.
For me, it was $10 in 2 hours. That’s super cheap if it saves me significant time. Jury’s out on that, though.
> but I'd need easy, cheap access to try it.
You can try it for cheap with the normal pay-as-you-go way.
I always imagined that these $10/mo plans are essentially loss leaders and that in the long run, the price should be much higher. I'm not even sure if that $100/mo plan pays for its underlying costs.
I think their free tiers are by definition of loss leaders, but I think you're right, all of there offerings are loss leaders. I know I can get more from my $20 using Claude Pro than I can using their API Workbench. It is such a competitive space that I don't think its unrealistic for these companies to ever be cash positive because all cash they have needs to be spent on competing in this space.
> I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful
I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.
At 10-20$ a month that calculation is trivial to make. At a 100$ I'm honestly not getting that much value out of AI, especially not every month, and especially not compared to cheaper versions.
I think this thinking is flawed. First, it presupposes a linear value/cost relationship. That is not always true - a bag that costs 100x as much is not 100x more useful.
Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.
Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)
I am glad people like you exist, but I don’t think the proposition you suggest makes sense.
Tangential, but I don't want to use LLMs for writing code because it's one of the things I enjoy the most in life, but it's feeling that I'm going to need to have to to get ready for the next years of my career. I've had some experiences with Claude that have seriously impressed me, but it takes away the fun that I've found in my jobs since I was in middle school writing small programs.
Does anyone have advice for maintaining this feeling but also going with the flow and using LLMs to be more productive (since it feels like it'll be required in the next few years at many jobs)? Do I just have to accept that work will become work and I'll have to get my fix through hobby projects?
I faced a related dilemma when I finished my CS degree: to work as a full-stack dev or to work on more foundational technology (and actually use what I learned in my degree). My experience is that the "foundational technology" area is more "research-oriented", which means you get to work on projects where LLM's don't help that much: writing code in languages that have little data in the LLM's training corpus, coming up with performance benchmarking approaches unique for your application, improving a workload's throughput with insights derived from your benchmarking results and your ingenuity, etc. Had I gone down the full-stack path, I think I'd be worried now.
> I don't want to use LLMs for writing code because it's one of the things I enjoy the most in life
I think LLM's are really good for the "drudge work" when you're coding. I always say it's excellent for things where the actual task is easy but the bottleneck is how fast you can type.
As an example I had a project where I was previously extracting all strings in the UI into an object. For a number of reasons I wanted to move away from this but this codebase is well over 50k LOC and I have probably 5k lines of strings. Doing this manually would have been very tedious and would have taken me quite some time so I leveraged AI to help me and managed to refactor all the strings in my app in a little over an hour.
Exactly. This is the use case that companies should be improving. The AGI marketing hype is a distraction.
Are you using it for other things? I think you can write code without it but it’s so good for research and stack overflow replacement.
Last night I used it to look through some project in an open source code base in a language I’m not familiar with to get a report on how that project works. I wanted to know what are its capabilities and integrations with these other specialized tools, because the documentation is so limited. It saved me time and didn’t help me write code. Beyond that it’s good for asking really stupid questions about complex topics that you’d get roasted on for stack overflow.
How can you be sure that the report is accurate? Did you verify that the project actually has the capabilities & integrates with the other specialized tools? I've seen many instances where the model either left out important information or came up with totally new stuff that got buried in the rest (mostly true) of the answer.
I think there will always be jobs out there that don't demand you write code with an LLM, just the same that most jobs don't demand you use vim or emacs or LSP-based autocomplete as part of your workflow.
You don't have to go with the flow. I took a step back from AI tech because a lot of startups in that field come with extra cultural baggage that doesn't sit well with me.
Do you use compilers? Linker loaders? Web bundlers? Linters and formatters? Code gen for or from schema? Image editors? Memory safe or garbage collected languages?
Then you already use levers to build code.
LLMs are a new kind of tool. They’re weird and probabilistic and only sometimes useful. We don’t yet know quite how and when to use them. But they seem like one more lever for us to wield.
I think the probabilistic nature is a huge divider. It requires a complete different way of working and it's understandable that people have trouble switching from one to the other (easier for experienced devs, in my experience, but still makes you switch to code reviewer mode too often).
Have a side project where you control the entire experience and let your main job bend to the beast.
Treat them as resources for remembering/exploring code libraries and documentation. For example, I needed to import some JSON files as structs into Unreal Engine. Gemini helped me to quickly identify the classes UE has for working with JSON.
> Does anyone have advice for maintaining this feeling but also going with the flow and using LLMs to be more productive
Coding with LLMs brought me so much more joy coding. Not always, but it is getting better. Sometimes is quite frustrating, but when you have some good idea, explain it well and get the model to generate the code the way you would code or even better and you can use it to build new things faster, that's magical. Many devs are having this experience, some earlier, some now, some later. But for sure I would not say that using LLMs to code made it less enjoyable.
I'm curious whether anyone's actually using Claude code successfully. I tried it on release and found it negative value for tasks other than spinning up generic web projects. For existing codebases of even a moderate size, it burns through cash to write code that is always slightly wrong and requires more tuning than writing it myself.
Absolutely stellar for 0-to-1-oriented frontend-related tasks, less so but still quite useful for isolated features in backends. For larger changes or smaller changes in large/more interconnected codebases, refactors, test-run-fix-loops, and similar, it has mostly provided negative value for me unfortunately. I keep wondering if it's a me problem. It would probably do much better if I wrote very lengthy prompts to micromanage little details, but I've found that to be a surprisingly draining activity, so I prefer to give it a shot with a more generic prompt and either let it run or give up, depending on which direction it takes.
Yes. For small apps, as well distributed systems.
You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.
They also have todos built in which make the above even more powerful.
The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.
Is that final number really that crazy? With a well defined goal, you can put out 5-8K per day by writing code the old fashioned way. Also would love to see the code, since in my experience (I use Cursor as a daily driver), AI bloats code by 50% or more with unnecessary comments and whitespace especially when making full classes/files.
> I spend a lot of time setting Claude code up for success.
Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!
You'll never see the code. They will just say how amazingly awesome it is, how it will fundamentally alter how coding is done, etc... and then nothing. Then if you look into who posts it, they work in some AI related startup and aren't even a coder.
Not open source but depending on certain context i can show whoever. im not hard to find.
Ive done just about everything across the full & distributed stack. So I'm down to jam on my code/systems and how I instruct & rely on (confidently) AI to help build them.
5k likes of code a day is 10 lines of code a minute solidly for 8 hours straight. Whatever way you cut that with white space, bracket alignment, that’s a pretty serious amount of code to chunk out.
If I am writing Go, it is easy to generate enough if/else and error checks. When working in java, basic code can bloat to big LoC over several hours(first draft, which is obviously cleaned up later before going to PR). React and other FE frameworks also tend to require huge LoC count(mostly boilerplate and auto completed rather than thoughtfully planned and written). It is not that serious amount as you may think.
Nitpicking like this must be fair, if you look at typical AI code - styles, extra newlines, comments, tests/fixtures, etc. it is the same. And again LoC isn't a good measurement in the first place.
Not all my 5k lines are hand-written or even more than a character; a line can be a closing bracket etc. which autocomplete has handled for the last 20 years. It's definitely an achievement, which is why it's important to get clarity when folks claim to reach peak developer productivity with some new tools. To quote the curl devs, "AI slop" isn't worth nearly the same as thoughtful code right now.
are people really committing 5k lines a day without AI assistance even once a month?
I don't think I've ever done this or worked with anyone who had this type of output.
Maybe if you are copy-pasting some html templates, but then it is not “writing code”. Handwriting complex logic, at 5k sloc per day, no way.
It depends upon how well mapped out the problem is in your head. If it's an unfamiliar domain, no way.
Nobody is writing 5k consistently on a daily basis. Sure if it’s a bunch of boiler scaffolding maybe.
I daily drive cursor and I have rules to limit comments. I get comments on complex lines and that’s it.
I'd be really interested in seeing the source for this, if it's an open-source project, along with the prompts and some examples. Or other source/prompt examples you know of.
A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.
While not directly what you're asking for, I find this link extremely fascinating - https://aider.chat/HISTORY.html
For context, this is aider tracking aider's code written by an LLM. Of course there's still a human in the loop, but the stats look really cool. It's the first time I've seen such a product work on itself and tracking the results.
Not open source but depending on certain context i can show you. im not hard to find.
Aider writes 70-80% of its own code: https://aider.chat/HISTORY.html
Can you share more about what you mean by a meta context/tasking management system? I’m always curious when I see people who have happily spent large amounts on api tokens.
Here is some insight... I had gemini obfuscate my business context so if something sounds weird it is probably because of that.
https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...
The framework is basically the instructions and my general guidance for updating and ensuring the details of critical information get injected into context. some of those prompts I commented here: https://news.ycombinator.com/item?id=43932858
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
So i use roo and you have the architecture mode draft out in as much detail as you want, plans, tech stack choices, todos, etc. Switch to orchestration mode to execute the plan, including verifying things are done correctly. It sub tasks out the todos. Tell it to not bother you unless it has a question. Cone back in thirty and see how it’s doing. You can have it commit to a branch per task if you want. Etc etc.
I use it, on a large Clojure/ClojureScript application. And it's good.
The interactions and results are roughly in line with what I'd expect from a junior intern. E.g. don't expect miracles, the answers will sometimes be wrong, the solutions will be naive, and you have to describe what you need done in detail.
The great thing about Claude code is that (as opposed to most other tools) you can start it in a large code base and it will be able to find its way, without me manually "attaching files to context". This is very important, and overlooked in competing solutions.
I tried using aider and plandex, and none of them worked as well. After lots of fiddling I could get mediocre results. Claude Code just works, I can start it up and start DOING THINGS.
It does best with simple repetitive tasks: add another command line option similar to others, add an API interface to functions similar to other examples, etc.
In other words, I'd give it a serious thumbs up: I'd rather work with this than a junior intern, and I have hope for improvement in models in the future.
Here's a very small piece of I code I generated quickly (i.e. <5 min) for a small task (I generated some data and wanted to check the best way to compress it):
https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
> I don't use it in large codebases (all agentic tools for me choke quickly)
Claude code, too?
I found that it is the only one that does a good job in a large codebase. It seems to be very different from others I've tested (aider, plandex).
> I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
Where is the breakpoint here? What number of lines of code or tokens in a codebase when it becomes not worth it?
30 to 40ish in my experience. Current state of the art seems to lack thinking well about programming tasks with a layer of abstraction or zooming out a little bit in terms of what might be required.
I feel like as a programmer I have a meta-design in my head of how something should work, and the code itself is a snapshot of that, and the models currently struggle with this big picture view, and that becomes apparent as they make changes. Entirely willing to believe that Just Add Moar Parameters could fix that (but also entirely willing to believe that there's some kind of current technical dead-end there)
Claude Code is the first AI coding tool that actually worked for me on a small established Laravel codebase in production. It builds full stack features for me requiring only minor tweaks and guidance (and starting all over with new prompts). However, after a while I switched to Cursor Agent just because the IDE integration makes the workflow a little more convenient (especially the ability to roll back to previous checkpoints).
Yes. It costs me a few bucks per feature, which is an absolute no-brainer.
If you don't like what it suggests, undo the changes, tweak your prompt and start over. Don't chat with it to fix problems. It gets confused.
Just to throw my experience in, it's been _wildly_ effective.
Example;
I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.
This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.
This results in a file that claude uses like we do a readme.
Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.
$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)
Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.
Another example:
Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.
The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.
Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.
On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.
If it's burning through cash, you're not being focused enough with it. If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.
From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)
If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.
So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.
Most interesting ! Would you mind sharing the prompt and the resulting CLAUDE.md file ?
Thx !
How do you know what it built is correct if you don’t know C?
They'll find out by the CVEs 1 year later.
This is a great report. Using a claude.md like that is honestly genius.
Been on this about a week at the $100/mo mark. I’m not hitting quota limits (I’d swap to the $200/mo in a heartbeat if I were), using Claude Code on multiple tasks simultaneously without abandon. Prior to the flat plan I was spending nearly $1k/mo on tokens. That figure was justifiable but painful. Paying a tenth of it is lovely.
$200/month?
Do people really get that much value from these tools?
I use Github's Copilot for $10 and I'm somewhat happy for what I get... but paying 10x or 20x that just seems insane.
To rescue a flailing project that I took over when a senior hire ghosted a customer in the middle of a project, I got the 200$ Pro package from OpenAI (which is much less usable than Claude for our purposes; there were other benefits related to my client's relationship w/ OpenAI)
In the end, I was able to rescue the code part, rebuilding a 3 month long 10 person project in 2 weeks, with another 2 weeks to implement a follow-up series of requirements. The sheer amount of discussion and code creation would have been impossible without AI, and I used the full limits I was afforded.
So to answer your question, I got my money's worth in that specific use case. That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
* Those who work in enterprise know intuitively what happened next.
> That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
The hardest part about enterprise backend development is understanding the requirements. "Understanding" is not about reading comprehension, and "requirements" are not the written requirements somebody gives you. It's about finding out what requirements are undocumented and which parts of the requirements document is misinformation. LLMs would just dutifully try to implement the written requirements with misinformation and missing edge cases, not the actual requirements.
If your employer spends $20k a month on you (salary + everything else), $200 a month breaks even at around a 1% boost in productivity.
Maybe if you're working in FAANG...
Lots of jobs where employers pay that much per head, not just FAANG. Honestly FAANG is probably spending double that for senior+ level engineers.
Average US salary for SWE is 10-12k/month. Fully-loaded cost (what they spend) is 1.5-2x salary so that's not an unrealistic number
So you're arguing about the top 10-20% earners in the US?
Also the world is much bigger than the US.
The point is you don't have to have FAANG salaries to hit $20k/mo in cost to your employer.
Tons of software developer jobs in the US for non-FAANG tier or unicorn startup companies are >$100k and easily hit $120-150k.
Also the fourth quintile mean was like $120k in the US in 2022. So you'd be in the top 30% of earners making that kind of money, not the top 10%.
https://taxpolicycenter.org/statistics/household-income-quin...
> unicorn startup companies are >$100k and easily hit $120-150k.
So still way below than $240k, no?
> So you'd be in the top 30% of earners making that kind of money, not the top 10%.
Maybe you missed it but I actually wrote "10-20%".
Also in 2024 earning $100k puts you in the top 20% of the US population.
https://dqydj.com/salary-percentile-calculator/
(which is already way above even the EU for dev salaries)
>So still way below than $240k, no?
No, fully loaded cost of an employee is 1.5-2x salary
You dropped off the "non" part of that. It's the non-Unicorn software companies easily paying $120k for a seasoned software developer in the US.
Also, I noticed where our sources diverged. I was looking at household income. My bad.
> which is already way above even the EU for dev salaries
Maybe they're underpaid.
Either way, I was responding to the idea that only a FAANG salary would cost an employer $20k/mo. For US software developer jobs, it can easily hit that without being in FAANG-tier or unicorn startup level companies. Tons of mid-sized low-key software companies you've never heard of pay $120k+ for software devs in the US.
The median software developer in Texas makes >$130k/yr. Think that's all just Facebook and Apple and silicon valley VC funded startup software devs? Similar story in Ohio, is that a place loaded with unicorn software startups? Those median salaries in those markets probably cost their employer around $20k/mo.
https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...
https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...
Yes, this product mostly only targets the top 20% of US earners. That's a lot of people, and a lot of HN readers especially.
If you get an employer match on 401k/HSA, the employer pays full healthcare premium, employer sponsored life insurance benefits, unemployment insurance, employer covered disability, payroll taxes, and all the other software costs, it wouldn't even take $200k in salary to cost $20k/mo. Someone could be making like $150k and still cost the company that much.
gentle reminder that the majority of developers do not live in the united states.
median salary for a japanese dev is ~$60k. same range for europe (swiss at ~100k, italy at ~30k for the extremes). then you go down.
Russia ~$37,000 Brazil ~$31,500 Nigeria ~$6,000 Morocco ~$11,800 Indonesia ~$13,500 and india ~$30k usd
(asked chatgpt for these numbers down there, JP and EU numbers are mostly correct though as I have first hand experience).
According to Wilkipedia, general average wages in Italy in 2023 were 48K, and SWE jobs are usually above average.
It would be interesting to know from where Chatgpt sourced those figures as some of them look very sketchy.
Sure, but ~$150k isn't exactly FAANG US salaries for an experienced software dev. That's my point. Lots of people forget how much extra many employers pay for a salaried employee on top of just the take home salary. Labor is expensive in the US.
I imagine a lot of people saw $20k/mo and thought the salary clearly had to be $200k+.
You have to set of your cost delta against your margin, not agaist your cost. Why do devs keep repeating this faulty reasoning? Where did this emerge?
If you cost 20K a month at a 5% average margin, the required ' break even' for a $200 cost increase is 20% not 1% increased productivity.
And it gets worse as you just assumed that increased 'productivity' 100% was converted back into extra margin, which is not obvious at all.
I wish these tools like Cursor, Windsurf etc. provide free option for working with open source projects, after all they trained their models via open source code.
It would really be helpful if Anthropic let user know the useage limits and what has been used and what is left instead of these vague X5 X20 vs Pro.
As someone that's happily on the Pro plan (I got a deal at $17 per month) I'm a bit confused seeing people pay $100+ per month ... like what benefits are you getting over the cheaper plan?
When coding with Claude I cherry pick code context, examples etc to provide for tasks so I'm curious to hear what other's workflows are like and what benefits you feel you get using Claude Code or the more expensive plans?
I also haven't run into limits for quite some time now.
Agent mode without rails is like a boat without a rudder.
What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.
These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.
I've tried every variation of this very thing. Even managed to build a quick and dirty ticketing system that I could assign to the LLM of my choosing. WITH context. Talking Graph Codebase's diagrams, mappings, tree structure of every possibility, simple documentation, complex documentation, a bunch of OSS to do this very thing automatically etcetcetc.
In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.
Eventually.....your return diminish. And any time you saved is gone.
And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.
Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.
Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.
/VENT
So, relying on a large context can be tricky. Instead I’ve tried to get to a ER model quickly. And from there build modules that don’t have tight dependencies.
Using Gemini 2.5 for generating instructions
This is the guide I use
https://github.com/bluedevilx/ai-driven-development/blob/mai...
How many tokens (across whole codebase) did it take for diminishing returns to kick in? What does the productivity vs token plot look like?
Aider using the web interface of Gemini Pro seems like the cheapest way to get flat pricing. All it needs is a bookmarklet to automate it.
Claude's limits are so vague - its not clear if buying Claude Max is cheaper than just using the API. Has anyone benchmarked this?
Sounds like they are great fans of Numberwang.
I cancelled my Claude subscription. I was happily using it for months - asking it the odd question or sometimes having longer discussions to talk through an idea.
Then one day I got nagged to upgrade or wait a few hours. I was pretty annoyed, I didn’t regard my usage as high and felt like a squeeze.
I cancelled my pro plan and now happily using Gemini which costs nothing. These AI companies are still finding their feet commercially!
> now happily using Gemini which costs nothing
…and you think this is going to last? :-)
Google is easily in the best position to hold competitive pricing with their LLMS. They can rely on their multi-billion dollar ad business to prop up their AI advancements, compared to OpenAI or Anthropic that only exist with heavy investment from VC.
Google will probably put 2.5 Pro behind a Google One account once it is out of preview, but I don't see a compelling reason they wouldn't keep Gemini incredibly price competitive with Claude or ChatGPT.
I wonder how successful this pricing model ($100-$200 a month with limits) is going to be. It is very hard to justify, when other tooling in the ~$20/month range offers unlimited usage, and comparable quality.
Is any of the ~$20/month with unlimited usage tooling actually profitable though? It goes without saying that if all else is equal then the product sold at a greater loss will be more popular, but that only works until the vendor runs out of money to light on fire.
Cursor keeps raising money… I for one personally enjoy burning all those VC dollars. Consider it a very tiny version of wealth redistribution.
Has anyone tried Claude Code with Vertex/Bedrock instead? How does it compare in terms of pricing?
Isn't it exactly the same on Bedrock as in the API?
Tbh, for these types of systems I do not like the rate limiting at all. I might go days without a need, then folowed by a day of very intense usage.
Also, the 'reputation grind' some of these systems set up where you have to climb 'usage Tiers' before being 'allowed' to use more? Just let me pay and use. I can't compare your system to my current provider before weeks of being throttled at unusable rates? This makes potentially switching to you for serious users way harder than it should be. Is that realy the outcome you want? And no, I am not willing to 'talk to sales' for running a quick feasibilty eval.
It is kinda sad that the information about how many tokens are included is not provided - its hard to judge versus pay-as-you-go api usage because of that
They could charge me anything. But unless they knock off the msg rate limiting, won't touch them.
Anthropic's pricing is crazy IMO. Still haven't tried the Code product because of it.
the new Claude code “max plan” would last me all of [1] 5mins… I don’t get why people are excited about this. High powered tools aren’t cheap and aren’t for the consumer…
[1] https://www.youtube.com/live/khr-cIc7zjc?si=oI9Fj33JBeDlQEYG
If that's the case you should stop using it, because there's no way you see any ROI when you spend that much to just do some coding stuff.
It would be cheaper to your company to literally pay your salary while you do nothing.
I'd love to see your math.
It's pretty simple: that usage in 5 min is probably at least $10 worth of API credits in that time (maybe $100).
A year has 2000 working hours, which is 24000 5-minute intervals. That means the company spending at least $240,000 on the Claude API (conservatively). So they would be better off having $100-200k you do nothing and hiring someone competent for that $240k.
Claude Max is less than a 1/2 percentage point of a Jr. Devs average salary. If you can't make that work then....
Both Anthropic and OpenAI don't have Linux desktop clients (to use MCP), so yea I'll skip.
Claude Code runs in the terminal
They could make $1000 a month version that runs on tape.
But at least Claude Desktop runs on my Intel Mac.
I'm sorry
- Apple
This isn't flat pricing. It's exactly the same API credits but you prepay for the month and lose anything you don't use.
Whether it turns out to be cheaper depends on your usage.
I thought Claude Code was absurdly expensive and not at all more capable than something like chatgpt combined with copilot.
My first language is not English, but is this changing the meaning of 'flat'? To me this is not flat.
It's flat as in flat rate, similar to fixed rate, meaning a set cost per month, instead of pay-as-you-go. Hope that helps.
It's flat if you graph your spend over multiple months :)
Yes, let’s all agree not to compete with something that competes with us. Galaxy brain
Worth it, but I’m chilling until the next major model release.
It still double downs on non working solutions
Little tip, you can get pretty close to this with a $20 Claude subscription using the desktop commander MCP.
Do you still need a phone number to register with claude?
Judging by your username you're in Israel?
If so just get yourself an Israeli mobile virtual number (which can receive SMS)
https://www.flynumber.com/cities/israel/mobile/
Haha. No I'm not from Israel. Just a Star Trek fan:). But thanks for the info. My issue with that is that I don't want to give a way my phone number, and I certainly don't want to pay for a service that gives me a phone number.
I am sure this is worth every dime, but my workflow is so used to Cursor now (cursor rules, model choice, tab complete, to be specific), that I can't be bothered to try this out.
If you're using Cursor with Claude it's gonna be pretty much the same thing. Personally I use Claude Code because I hate the Cursor interface but if you like it I don't think you're missing much.
I don't enjoy the interface as such, rather the workflows that it enables.
[flagged]
Agree with the other commenter. I have seen at least 2 other posts today where you plug your project.
> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.
https://news.ycombinator.com/newsguidelines.html
[flagged]
nah, there are low powered tools and there’s high powered tools. If people want $20/month happy meal toys in business that business will get left behind. Ignore the consumer market, make Bugatti’s instead - https://ghuntley.com/redlining
ps - catchup for social zoom beers?
you're right about the context limits
i pinged what i think is the right ghuntley on linkedin, rizzler looks like the next feature i'm building for brokk :)
This is me - speak soon. https://www.linkedin.com/in/geoffreyhuntley