LLM을 새로운 고수준 언어로
LLMs as the new high level language
요약
이 글은 대규모 언어 모델(LLM)과 그 에이전트들이 어셈블리에서 벗어난 상위 언어처럼 새로운 상위 프로그래밍 언어가 되고 있다고 제안합니다. 자율적으로 작업하는 LLM 에이전트가 개발자 생산성을 크게 향상시킬 수 있다고 주장합니다. 코드 품질과 이해 가능성과 같은 잠재적 과제를 인정하면서도, LLM이 명세와 시나리오가 에이전트 워크플로우를 주도하는 비대화면 개발의 새로운 패러다임을 가능하게 하여 웹 및 소프트웨어 개발을 재정의할 수 있다고 제안합니다.
댓글 (341)
The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.
So when things break or they have to make changes, and the AI gets lost down a rabbit hole, who is held accountable?
But if your job depends on taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems, you’ll be fine. The premium for these skills is going _way_ up.
We are in this pickle because programmers are good at making tools that help programmers. Programming is the tip of the spear, as far as AI's impact goes, but there's more to come.
Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.
Short term, doing for lawyers, architects, civil engineers, doctors, etc what Claude Code has done for programmers is a winning business strategy. Long term, gaining expertise in any field of intellectual labor is setting yourself up to be replaced.
a.k.a. Being a programmer.
> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.
What lol. Translation? Graphic design?
I think it's doubtful you'll be even that; certainly not with the salary and status that normally entails.
> I'm very convinced non-technical people will be able to use these tools
This suggests that the skill ceiling of "Vibe Coding" is actually quite low, calling into question the sense of urgency with which certain AI influnecers present it, as if it were a skill that you need to invest major time & effort to hone now (with their help, of course), lest you get left behind and have to "catch up" later. Yet one could easily see it being akin to Googling, which was also a skill (when Google was usable), one that did indeed increase your efficiency and employable, but with a low ceiling, such that "Googler" was never a job by itself, the way some suggest "prompt engineer" will be. The Google analogy is apt, in that you're typing keywords into a blackbox until it spits out what you want; quite akin to how people describe "prompt engineering."
Also the Vibe Coding skillset--a bag of tricks and book of incantations you're told can cajole the model--has a high churn rate. Once, narrow context windows meant restarting a session de novo was advisable if you hit a roadblock, but now it's usually the opposite.
If this all true, then wouldn't the correct takeaway, rather than embracing and mastering "Vibe Coding" (as influencers suggest), be to "pivot" to a new career, like welding?
> The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact
What's funny is artists immediately, correctly perceived the threat of AI. You didn't see cope about it being "just another tool, like Photoshop."
I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)
I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.
These models are blowing my mind. It's like I have an army of juniors I can actually trust.
where's the catch? SQL is an old technology, surely an LLM is good with it
In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity. They don't follow DRY principles unless you push them very hard in that direction (and even then not always), and sometimes they do things that just fly in the face of common sense. Example of that last part: I was writing some Ruby tests with Opus 4.6 yesterday, and I got dozens of tests that amounted to this:
x = X.new
assert x.kind_of?(X)
This is of course an entirely meaningless check. But if you aren't reading the tests and you just run the test job and see hundreds of green check marks and dozens of classes covered, it could give you a false sense of securityThis is not an appropriate analogy, at least not right now.
Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.
"Generate a Frontend End for me now please so I don't need to think"
LLM starts outputting tokens
Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use
Then out of a shadowy alleyway a man in a trenchcoat approaches
"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"
"How much?"
"Oh that'll be $200.... and your muscle memory for running maven commands"
"Shut up and take my money"
----- 5 months later, washed up and disconnected from cloud LLMs ------
"Anyone got any spare tokens I could use?"
Here's $1000. Please do that. Don't bother with the LLM.
My dopamine rush comes from solving a problem, learning something new, producing a particularly elegant and performant piece of code, etc. There's an aspect of hubris involved, to be sure.
Using a tool to produce the end result gives me no such satisfaction. It's akin to outsourcing my work to someone who can do it faster than me. If anything, I get cortisol hits when the tool doesn't follow my directions and produces garbage output, which I have to troubleshoot and fix myself.
"I prompted it like this"
"I gave it the same prompt, and it came out different"
It's not programming. It might be having a pseudo-conversation with a complex system, but it's not programming.
Well I think the article would say that you can diff the documentation, and it's the documentation that is feeding the AI in this new paradigm (which isn't direct prompting).
If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.
Functions like:
updatesUsername(string) returns result
...can be turned into generic functional euphemism
takeStringRtnBool(string) returns bool
...same thing. context can be established by the data passed in, external system interactions (updates user values, inventory of widgets)
as workers SWEs are just obfuscating how repetitive their effort is to people who don't know better
the era of pure data driven systems is arrived. in-line with the push to dump OOP we're dumping irrelevant context in the code altogether: https://en.wikipedia.org/wiki/Data-driven_programming
I wrote a program in C and and gave it to gcc. Then I gave the same program to clang and I got a different result.
I guess C code isn't programming.
>"I gave it the same prompt, and it came out different"
1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.
Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.
The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.
The implementations that come out are buggy or just plain broken
The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.
LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.
Give me a whiteboard and a fellow engineer anyday
The improvements become evident from the nature of the problem in the physical world. I can see why a purely text-based intelligence could not have derived them from the specs, and I haven't been able to coax them out of LLMs with any amount of prodding and persuasion. They reason about the problem in some abstract space detached from reality; they're brilliant savants in that sense, but you can't teach a blind person what the colour red feels like to see.
Its easy to claim this and just walk away. But better for overall discussion to provide the example.
Also much of the really annoying, time consuming stuff, like frontend code. Writing UIs is not rocket science, but hard in a bad way and LLMs are not helping much there.
Plus, while they are _very_ good at finding common issues and gotchas quickly that are documented online (say you use some kind of library that you're not familiar with in a slightly wrong way, or you have a version conflict that causes an issue), they are near useless when debugging slightly deeper issues and just waste a ton of time.
Right now LLMs are taking languages meant for humans to understand better via abstraction, what if the next language is designed for optimal LLM/world model understanding?
Or instead of an entirely new language, theres some form of compiling/transpiling from the model language to a human centric one like WASM for LLMs
I'm more worried about the opposite: the next popular programming paradigm will be something that's hard to read for humans but not-so-hard for LLM. For example, English -> assembly.
Just like AlphaZero ditched human Go matches and trained on synthetic ones, and got better this way
I can take some C or Fortran code from 10 years ago, build it and get identical results.
You certainly can get identical results, but it's equally certainly not going to be that simple a path frequently.
https://www.observationalhazard.com/2025/12/c-java-java-llm....
"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.
LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.
Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."
Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)
LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.
I'll take the gloriously reliable classical compute world to compile my stuff any day.
I have already watched integrations between SaaS being deployed with agents instead of classical middleware.
For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?
Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.
So I'm guessing they just rise because they spark a debate?
The optimists upvote and praise this type of content, then the pessimists come to comment why this field is going to the dogs. Rinse and repeat.
Precisely. Attention economy. It rules.
Which is why I only quickly scan through the comments to see if there are new insights I haven't seen in the past few months. Surprise, almost never.
Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].
Prompts provided to an LLM are neither sufficient nor local in the same way.
The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.
One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.
While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.
The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.
Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?
For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.
[1]: loosely in the statistical sense of containing all the information the programmer was able to encode https://en.wikipedia.org/wiki/Sufficient_statistic
[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.
I know someone who has been spending $7k / month on Cursor tokens for the past six months, managing such a team of agents… But curiously the results seem to be endless PDF-ware, and every month there’s a new reason why the project is not yet quite ready to be used on real data.
LLMs are very good at making you think they’re giving you what you hoped to get.
I want my C-suite and V-suite LLMs to feel like they earned their positions through hard work, values, and commitment to their company.
* = (Not to be confused with a famous poem by Dante Alighieri)
Has this been true since the 90s?
I pretty much only hear people saying modern compilers are unbeatable.
Everything else is secondary.
critical distinction: unless your getting paid comparable to your ouput (literally 0 traditional 9-5 software jobs I know unfortunately) this is infact the opposite - a subscription to any of these services reduces your overall salary, it doesnt make it higher...
then there is the case i know the dishonest are doing is firing of claude or whatever and going for a walk
The hottest new programming language is English
______________For instance: Here is an email from my manager at 1pm today. Open the policy document he is referring to, create a new version, and add the changes he wants. refer to the entire codebase (our company onedrive/google drive/dropbox whatever) to make sure it is contextually correct.
>Sure, here is the document for your review
Great, reply back to manager with attachment linked to OneDrive
The user still has to be in the loop to instruct the LLM every time, and the tiniest nuances in each execution of that work still matters. If it didn't we'd have replaced people with bash scripts many decades ago. When we've tried to do that, the maintenance of those scripts became a game of whack-a-mole that never ended and they're eventually abandoned. I think sometimes people forget how ineffective most software is even when written as good as it can be. LLMs don't unlock any new abilities there.
What this actually does is make people more available for meeting time. Productivity doesn't budge at all. :)
In other words, the "busy work" has always been faster to do than the decision making, and if someone has meetings to attend they don't strictly do busy work.
Maybe the more interesting outcome is that with the increased meeting time comes much deeper drinks of the kool-aid, and businesses become more cultish than they already are. That to me sounds far more sinister than kicking people out onto the curb. Employees become "agents of change" through the money and influence they're given. They might actually hire more :D
We don't commit compiled blobs in source control. Why can't the same be done for LLMs?
Imagine a machine that does the job sometimes but fails on some other times. Wonderful isn't it?
The generation step changed. The maintenance step didn't. And most codebases spend 90% of their life in maintenance mode.
The real test of whether prompts become a "language" is whether they become versioned, reviewed artifacts that teams commit to repos. Right now they're closer to Slack messages than source files. Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold.
1. OK, let's create 100 instances of prompt under the hood, 1-2 will hallucinate, 3-5 will produce something different from 90% of remaining, and it can compile based on 90% of answers
2. computer memory is also not 100% reliable , but we live with it somehow without man-in-the-middle manually check layer?
Aren't you telling Claude/Codex to debug it for you?
Ask yourself "Computer memory and disk are also not 100% reliable , but we live with it somehow without man-in-the-middle manual check layer, yes?" Answer about LLM will be the same, if good enough level of similarity/same asnwers is achieved.
Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!
Which is exactly one of the AOT only, no GC, crowds use as example why theirs is better.
Give a spec to a designer or developer. Do you get the same result every time?
I’m going to guess no. The results can vary wildly depending on the person.
The code generated by LLMs will still be deterministic. What is different is the product team tools to create that product.
At a high level, does using LLMs to do all or most of the coding ultimately help the business?
Genuine question, but why not set the temperature to 0? I do this for non-code related inference when I want the same response to a prompt each time.
There’s a related issue that gives me deep concern: if LLMs are the new programming languages we don’t even own the compilers. They can be taken from us at any time.
New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different. And who knows what edge cases we’ll run into when being forced to upgrade models?
(and that’s putting aside what an enormous step back it would be to rent a compiler rather than own one for free)
Gosh, LLMs been a thing only for a few years, but people became stupid already.
> what Javascript/Python/Perl did to Java
FFS... What did python do to java?
Because that's pretty much what "agentic" LLM coding systems are an automation of, skimming through forums or repos and cribbing the stuff that looks OK.
What did Javascript/Python do to Java? They are not interchangeable nor comparable. I don't think Federico's opinion is worth reading further.
Last I checked with every other high level language, you save the source and then rerun the compiler to generate the artifact.
With LLMs you throw away the 'source' and save the artifact.
There are now approaches were the prompt itself is being structured in a way (sort of like a spec) so you get to a similar result quicker. Not sure how well those work (I actually assume they suck, but I have not tried them).
Also some frameworks, templates and so on, provide a bunch of structured markdown files that nudges LLM assistance to avoid common issues and do things in a certain way.
In essence: we're witnessing a paradigm shift. And for moments like these—I invite you—it's invaluable to have studied Popper and Kuhn in those courses.
An even more provocative hypothesis: the 'Vienna Circle' has morphed into the 'Circle of Big Tech,' gatekeepers of the data. What's the role of academia here? What happened to professional researchers? The way we learn has been hijacked by these brilliant companies, which—at least this time—have a clear horizon: maximizing profits. What clear horizon did the stewards of the scientific method have before? Wasn't it tainted by the enunciator's position? The personal trajectory of the scientist, the institution (university) funding them? Ideology, politics?
This time, it seems, we know exactly where we're headed.
(This comment was translated from Spanish, please excuse the rough edges)
A practical way to get better results is to stop prompting with prose and start providing explicit models of what we want. In that sense, UML-like notations can act as a bridge between human intent and machine output. Instead of:
“Write a function to do X…”
we give:
“Here’s a class diagram + state machine; generate safe C/C++/Rust code that implements it.”
UML is already a formal, standardized DSL for software structure. LLMs have no trouble consuming textual forms (PlantUML, Mermaid, etc.) and generating disciplined code from them. The value isn’t diagrams for humans but constraining the model’s degrees of freedom.
Beginner programmers want: "make this feature"
Experienced devs want: control over memory, data flow, timing, failure modes
That is why abstractions feel magical at first and suffocating later which sparks this whole debate.
> LLMs are far more nondeterministic than previous higher level languages. They also can help you figure out things at the high level (descriptions) in a way that no previous layer could help you dealing with itself. […] What about quality and understandability? If instead of a big stack, we use a good substrate, the line count of the LLM output will be much less, and more understandable. If this is the case, we can vastly increase the quality and performance of the systems we build.
How does this even work? There is no universe I can imagine where a natural language can be universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it.
Whether this is doable through orchestration or through carefully guided HITL by various specialists in their fields - or maybe not at all! - I suspect will depend on which domain you're operating in.
There's minimal opportunity with lifetime annotations. I'm sure very small options elsewhere, too.
The idea of replacing Rust with natural language seems insane. Maybe I'm being naive, but I can't see why or how it could possibly be useful.
Rust is simply Chinese unless you understand what it's doing. If you translate it to natural language, it's still gibberish, unless you understand what it does and why first. In which case, the syntax is nearly infinitely more expressive than natural language.
That's literally the point of the language, and it wasn't built by morons!
The issue is that if you fire off 10 agents to work autonomously for an extended period of time at least 9 of them will build the WRONG THING.
The problem is context management and decision making based on that context. LLMs will always make assumptions about what you want, and the more assumptions they make the higher the likelihood that one or more of them is wrong.
I mean, we only have them because it is strictly necessary. If we could make architectures friendly to programming directly, we would have.
In that sense, high level languages are not a marvelous thing but a burden we have to carry because of the strict requirements of low level ones. The less burdens like those we have, the better.
Well, nobody could figure out how to program them. Except the few outcasts like us who went on to suffer for the rest of our lives for it :')
With phones & LLMs this is the closest we have come to that original promise of a computer in every home and everyone being able to do anything with it, that isn't pre-dictated by corporations and their apps:
Ideally ChatGPT etc should be able to create interactive apps on the fly on iPhone etc. Imagine having a specific need and just being able to say it and get an app right away just for you on your device.
I discovered that it is not trivial to conceptualize app to that extent of clarity which is required for deterministic output of LLM. It's way easier to say than to actually implement by yourself (that's why examples are so interesting to see).
Backwards dynamics when you get spec/doc based on the source code does not work good enough.