SectorC: 512바이트 C 컴파일러 (2023)

SectorC: A C Compiler in 512 bytes (2023)

347 pointsby valyala2026. 2. 7.72 comments
원문 보기 (xorvoid.com)

요약

SectorC는 x86-16 어셈블리로 작성되었으며 512바이트 부트 섹터 안에 완벽하게 들어가는 놀라운 C 컴파일러입니다. 흥미로운 프로그램을 작성하기에 충분한 C의 부분 집합을 지원하여 역사상 가장 작은 C 컴파일러일 가능성이 높습니다. 개발 과정에서는 공간으로 구분된 "메가 토큰"과 `atoi()`를 나쁜 해시 함수로 사용하는 것과 같은 혁신적인 기술을 사용하여 극심한 크기 제약을 극복했으며, 궁극적으로 극도로 제한된 공간 내에서 기능적인 C 컴파일러를 구현했습니다.

댓글 (77)

riedel22시간 전
Beautiful, but make sure to quickly add 2023 to the title.

Discussed at the time: https://news.ycombinator.com/item?id=36064971

dang15시간 전
Thanks! Macroexpanded:

SectorC: A C Compiler in 512 bytes - https://news.ycombinator.com/item?id=36064971 - May 2023 (80 comments)

gjvc4시간 전
why? and why "quickly?
sanufar22시간 전
The way hashing is used for tokens and for making a pseudo symbol table is such an elegant idea.
fix4fun21시간 전
I think the same. Really nice project and good trick with hashing tokens.

PS. There left 21 bytes (21 * 0x00 - from 0x01e0 to 0x01fd). Maybe something can be packed there ;)

avadodin8시간 전
I actually "shipped" a parser using the symbols' hash(as the only identifier) for a test tool once. Hopefully, the users never used enough symbols to collide 32-bits.
benj1113시간 전
I've had the idea before. Was never quite brave enough to do it. It's elegant until it isn't....
xorvoid22시간 전
I may be the author.. enjoy! It was an absolute blast making this!
veltas22시간 전
This is very nice. I'm currently writing a minimalist C compiler although my goal isn't fitting in a boot sector, it's more targeted at 8-bit systems with a lot more room than that.

This is a great demonstration of how simple the bare bones of C are, which I think is one reason I and many others find it so appealing despite how Spartan it is. C really evolved from B which was a demake of Fortran, if Ken Thompson is to be trusted.

JamesTRexx21시간 전
Would and how much would it shrink when if, while, and for were replaced by the simple goto routine? (after all, in assembly there is only jmp and no other fancy jump instruction (I assume) ).

And PS, it's "chose your own adventure". :-) I love minimalism.

einpoklum21시간 전
An interesting use case - for the compiler as-is or for the essentiall idea of barely-C - might be in bootstrapping chains, i.e. starting from tiny platform-specific binaries one could verify the disassembly of, and gradually building more complex tools, interpreters, and compiler, so that eventually you get to something like a version of GCC and can then build an entire OS distribution.

Examples:

https://github.com/cosinusoidally/mishmashvm/

and https://github.com/cosinusoidally/tcc_bootstrap_alt/

NooneAtAll321시간 전
> I wrote a fairly straight-forward and minimalist lexer and it took >150 lines of C code

was it supposed to be "<150"?

owalt21시간 전
They're saying the naive implementation was more than 150 lines of C code (300-450 bytes), i.e. too big.
mojuba21시간 전
Compare that to the C compiler in 100,000 lines written by Claude in two weeks for $20,000 (I think was posted on HN just yesterday)
vidarh20시간 전
It's a fun comparison, but with the notable difference that that one can compile the Linux kernel and generate code for multiple different architectures, while this one can only compile a small proportion of valid C. It's a great project, but it's not so much a C compiler, as a compiler for a subset of C that allows all programs this compiler can compile to also be compiled by an actual C compiler, but not vice versa.
[삭제된 댓글]
mati36521시간 전
Oh, it looks like my X86-16 boot sector C compiler that I made recently [1]. Writing boot sector games has a nostalgic magic to it, when programming was actually fun and showed off your skills. It's a shame that the AI era has terribly devalued these projects.

[1] https://github.com/Mati365/ts-c-compiler

w4yai12시간 전
> when programming was actually fun and showed off your skills

Oh no. Now more people are able to do what I do. I'm not special anymore.

guenthert6시간 전
Er, what? The article describes a compiler for a not-quite-C programming language which fits entirely in 512B. Your project, if I see this correctly, can optionally produce code meant to execute as boot sector.

Both interesting projects, but other than the words 'boot sector', 'C' and 'compiler', I don't see a similarity.

SeanSullivan8621시간 전
Why is it called a C Compiler if it's a subset of C?
[삭제된 댓글]
gonzus20시간 전
Lacking support for structs, I think this is too minimalistic to be called "a C compiler".
pilord31418시간 전
you bootstrap it into a library you can include optionally, duh
[삭제된 댓글]
benj1113시간 전
Weren't structs a fairly late addition to C?

And anyway, isn't that kind of missing the point. 512 bytes isn't much. Your comment is nearly a 5th of that budget.

kayo_2021103020시간 전
Nice. Very K&R-ish. Not a bad thing.
EGreg18시간 전
Reminds me of Allegro SizeHack where we made games in 10KB - but we were using C and Allegro library!

https://www.oocities.org/trentgamblin/sizehack/entries.html#...

layer818시간 전
If this implementation had existed in the 1980s, the C standard would have a rule that different tokens hashing to the same 16-bit value invoke undefined behavior, and optimizing compilers in the 2000s would simply optimize such tokens away to a no-op. ;)
xorvoid17시간 전
Too real! LMAO
RodgerTheGreat14시간 전
"you don't have -wTokenHashCollision enabled! it's your own foolish ignorance that triggered UB; the spec is perfectly clear!"
fooker15시간 전
This is so cool!

Fun fact, Tiny C Compiler was derived from such a C compiler submitted to the the International Obfuscated C Code Contest.

https://www.ioccc.org/2001/bellard/index.html

xorvoid14시간 전
Further Fun fact, that submission was called OTCC. I reverse engineered it and that provided inspiration for SectorC.

https://xorvoid.com/otcc_deobfuscated.html https://github.com/xorvoid/otcc_deobfuscated

pseudohadamard12시간 전
Meh, I did an entire awk interpreter in two lines:

  #!/bin/sh
  echo "awk: bailing out" >&2
userbinator14시간 전
C-subset, to be precise; but microcomputer C compilers were in the tens of KB range, for one that can actually compile real C.
kreelman13시간 전
There seems to be a good amount of interest for a boot sector compiler!!

If you're running on Linux, adjust the qemu call to use alsa rather than coreaudio.

I generated a pull request for this on Github. If the author is happy enough with my verbose shell scripting style :-) it might get included.

wbsun13시간 전
Nice, now you can dd it to your boot sector and ... Wait, it is 2026, there are 1000 ways of booting and memory mapping on so-called unified ARM architecture @,@
DeathArrow10시간 전
For me is not interesting because it fits in 512 bytes, it's interesting because it's very simple. I think it would be a great introduction to learning about compilers.
shikaan9시간 전
Such a great read! Reminds me of the bootsector OS I made some time ago[^1]

Maybe it's time to equip it with a C compiler...

[1]: https://github.com/shikaan/osle

zahlman8시간 전
> Big Insight #2 is that atoi() behaves as a (bad) hash function on ordinary text. It consumes characters and updates a 16-bit integer.

I could have sworn I remembered atoi() being defined to return 0 for invalid input (i.e. text not representing an integer in base ten).

MobiusHorizons1시간 전
That would be true of one using a libc, but in a boot sector, you only have the bios, so the atoi being referenced is the one defined in c near the beginning of the article
wzbtoolbox7시간 전
This is the kind of project that reminds you how far removed modern development is from the actual machine. We pile abstractions on abstractions until "Hello World" needs 200MB of node_modules, and then someone fits a C compiler in 512 bytes.

Not saying we should all write boot sector code, but reading through projects like this is genuinely humbling. Great educational resource too.

lock159분 전
This kind of comment reminds me of how broad "software development" is.

On other HN posts, they're stating something like "software development is dead", "LLM as a compiler", "Do you read compiled assembly?", and so on.

While some other posts like this contain huge mechanical sympathy and literally r/w the assembly directly.

[삭제된 댓글]
[삭제된 댓글]