*bart simpson at the blackboard voice*: we don't need better hardware, we need more efficient software, we don't need better hardware, we need more efficient software, we...

*does a kickflip and is suddenly lisa rippin a sax solo whilst exiting stage right*

Show thread


i will not introduce recursive algorithms into the school attendance routines

i will not introduce recursive algorithms into the school attendance routines

i will not introduce recursive algorithms into the school attendance routines

@djsundog gotta love the episode where Bart gets detention for typos in whatever he's writing on the board, then gets caught in a time vortex


Bart and Morty o.O

(please note that this is for demonstration purposes only. do not attempt to reunite Bart and Morty, please. ~ management)

@djsundog long ago: "i write the software and you buy the hardware, deal?"

@djsundog @cypnk *rewrites everything in very error-prone C* heeeeeere ya goooooo

@djsundog Probably related: Even C isn’t as “close to the metal” like it used to be because no one understands the actual hardware. It’s all too complex, too proprietary, and everyone has too little time before the release cycle ends

@cypnk @djsundog *hackers (1995) voice* risc is the future

@cypnk @djsundog no but seriously risc computers much more closely match the machine architecture that C was implemented for, but can also be put on massively multi-core chips for parallel applications. imagine a raspberry pi, but it's 128 risc-v cpus on a single board

@cypnk @chr @djsundog i heard if you open up a pentium 2 there are just a bunch of pentium ones inside

@cypnk @djsundog it's so weird how our current hardware is basically risc, but with a complex microcode-assisted operation transformation frontend so it can pretend to be cisc, plus crazy instruction scheduling stuff to workaround the fact that you can't compile targeting the cpu's real pipeline (there's so many different implementations that targeting one would be counter-productive anyways).

@kepstin @djsundog Not to mention a whole heap of additional structure and software hidden beneath. I read somewhere Intel’s Management Engine is basically an embedded MINIX 3

On a CPU. That’s... nuts

@cypnk @djsundog the management engine isn't really in the path of normal computation, tho. it's more or less taking the separate management chip that servers have (typically an arm or mips core running linux!) and putting it on the same die as the cpu.

(it does do weird stuff in the boot path to handle loading signed firmware, etc, but once the cpu's booted it's mostly independent)

remember, a "computer" is just a network of various big and small processors talking to each-other.

@cypnk @djsundog I've always thought the transmeta crusoe was a really cool piece of tech - instead of making an x86 instruction decoder, do software emulation of an x86 processor, and throw in jit transpiling to the cpu's native instruction set so it performs reasonably well.

@kepstin @cypnk I was convinced Crusoe was going to lead to revolutionary new processor designs. Welp,

@djsundog @cypnk my favourite bit of the crusoe was the fact that the native instruction set had no mmu or memory protection - that's all implemented in the emulation layer.

@djsundog @cypnk i'm kinda sad that nvidia never added x86 support to the implementation they made after buying transmeta's IP

(also they kinda cheated a bit by having a hardware arm decoder in addition to the software dynamic recompilation layer)

@djsundog @cypnk the thing about it that improves efficiency, in theory, is that the conversion to cpu native opcodes can be cached for long periods of time in a large ram buffer (modern x86 cpus have smaller on-die micro-op caches), which means that it's a win to spend more time up-front re-optimizing for the cpu, instead of running scheduler & prediction tricks every time an instruction is decoded.

@djsundog @kepstin @cypnk Same. I honestly never expected the response to something better to be "we don't want something better."

@kepstin @cypnk @djsundog One of my professors once characterized x86 thus:

Imagine a field with a couple of trees. Cut them all down and build a cabin. Then demolish the cabin and use the remains to build a bigger one. Then demolish that one and use it to construct a small town. Then a small neighborhood. Then a large town. Then suburbs...

That's x86 right there.

The only way out would be to scrap the whole mess and start over. But then all the work would go into emulating the old shit.

@eldaking Um... yeah, kinda, now that I read about it. Somehow, I think Factorio has less technical debt in that regard.

@drwho @kepstin @cypnk @djsundog I am saying it for years, this is a deliberate military strategy.

It is our fault to accept these way too complex microprocessors. Such complexity is absolutely not needed.

@kepstin @cypnk @djsundog
> can't compile targeting the cpu's real pipeline
is that what Mill is trying to do?

@grainloom @cypnk @djsundog you'll need to give me some context, I don't know what "mill" is supposed to be here, and google searches are inconclusive.

@grainloom @cypnk @djsundog huh, that's interesting: "compilers are required to emit a specification which is then recompiled into an executable binary by a recompiler supplied by the Mill Computing company"

so code has to go through a machine-specific optimizer/translator before it can run at all.

@grainloom @cypnk @djsundog
An example of an cpu arch where compilers had to optimize for a specific pipeline was intel's Itanium - a fairly simple in-order vliw style core. it turned out that the compiler optimizers didn't get to the point where it was competitive with x86 until well after everyone had given up on it.

@kepstin @cypnk @djsundog they are aware of the Itanium's issues, AFAIK code generation to it is pretty straightforward. the belt is basically SSA in hardware.

@cypnk @djsundog Close to the metal is kind of a joke with modern architectures trying desperately to make the thing look like an 8086 to the assembler and compiler, when what's going on under the hood is a completely different beast.

@cypnk @djsundog How long do you think it'll be before the easiest to get compilers are those sold by the manufacturers, because they can generate the code-generating code straight from the VHDL?

@drwho @djsundog I think this is already happening somewhat, but for GPUs. I think this is what Nvidia CUDA is

@cypnk @djsundog ...I don't know. Have to look into it. And, perhaps, start weeping.

@drwho @cypnk @djsundog interestingly, processor manufacturers have been either forking or contributing directly to llvm so they only have to do the final code generation rather than the whole compiler.

AMD's ROCm platform for developing GPU applications is an example of that.

@kepstin @cypnk @djsundog I'm... not sure how I feel about that. Potentially sad?

@cypnk @djsundog assembly isn't even "close to metal" due to all the nonsense intel is doing

@ben @djsundog Yeah, pretty much. All manner of gymnastics under the hood that we'll never get to see. I'm not entirely convinced there's a single person at Intel itself that knows all of it at this point

@cypnk @ben might be able to prove mathematically that it's impossible for such a person to exist.

@ben @cypnk and at the opposite end, you can't even start the cpu on a raspberry pi without asking the gpu to start it up for you. it's ludicrous in its overcomplexity through and through, no matter where you look.

@cypnk Choose simpler microprocessors. I agree the programming model has been voluntarily complexified to the maximum extend, meaning, the Empire doesn't want you to code in assembly language anymore. They prefer that you be dependant on compilers, leaving more opportunities for the Empire to plant backdoors.



@djsundog i still want OCAP CPUs tho
moving some checks to hardware could make software simpler and more efficient

@grainloom oh, definitely - I am not preaching that we should throw away learning earned, I'm just advocating for a return to innovation outside of performance boost and various "fixes" - gimme more stuff like Chuck Moore's cpu designs and Transmeta and DEC Alpha again, rather than chasing market performance for cloudscale.

@djsundog @grainloom tbf, I’ve heard Alpha’s implementations were incredibly sloppy in a lot of areas, and people inadvertently triggered Meltdown-class bugs on them back in the day

while there were timing bugs in older designs (see the 1995 NSA paper that specifically called out the 80486), Alpha was arguably patient zero for the “performance above correctness” syndrome that we see today - especially because nearly all of Intel’s modern processors that are vulnerable to Spectre/Meltdown-class vulns are at least in part descended from Intel’s Alpha-killer, the Pentium Pro

@djsundog @grainloom I believe high-performance CPUs can be done securely, but it requires extreme attention to detail, as well as thinking of security with every element of the design

Low-performance (in-order, no branch prediction - think 80486, 68040, ARM9) CPU designs can be done wrong, but it’s harder. It’s even better if your memory subsystem doesn’t have a cache (because that’s really where a lot of timing issues come in), but modern DRAMs are designed entirely around large caches - I suspect reserving enough cache to be able to reverse the effects of a rolled back operation quickly would be sufficient (with a slight performance penalty…)

Of course, then, we can talk about Rowhammer, and DRAM itself being broken, too…

@bhtooefr @grainloom this is a fair analysis afaik. I'm more interested in the conditions that allowed it to come to market at all, flawed as it was, than the implementation itself.

@djsundog @grainloom cheaper to buy faster silicon than pay for the dev time to make the code faster

that’s been the case for decades

…but if you can skimp on the dev time for the faster silicon, your faster silicon gets to market faster and cheaper than the other company’s faster silicon

and that’s how you get shitty, buggy hardware running shitty, buggy software

@djsundog @grainloom and, given the complexity of some of these bugs and the conditions necessary to trigger them, it’s often the case that this stuff is in the field for years before the bugs are treated properly

sometimes, as was the case with the bugs being inadvertently triggered on Alpha (and on Xenon (the Xbox 360 CPU, where a new instruction on that CPU made it trivial to accidentally trigger it)) it’s been “known”, but only as a “don’t do this or your shit will crash”

and there’s been decades of computers just being unreliable, and “your shit crashing” not being thought of as a potential vulnerability, but instead something to be worked around to meet the shipping deadline

(note that this isn’t just a hardware problem, it’s of course a huge software problem too)

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!