Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tangential question about FPGAs: Is there any work on compiling code to a combination of hardware and software? I'm imagining that the "outer loop" of a program is still fairly standard ARM instructions, or similar, but the compiler turns some subroutines into specialised circuits. Even more ambitiously you could JIT-compile hot loops from machine instructions to hardware.

We already kind of do this manually over the long term (eg things like bfloat16, TF32 and hardware support for them in ML, or specialised video decoders). With mixed compilation you could do things like specify a floating point format on-the-fly, or mix and match formats, in software and still get high performance.



There was. But for Mips. By Microsoft which used NetBSD.

https://www.microsoft.com/en-us/research/project/emips/

https://www.microsoft.com/en-us/research/publication/multico...

http://blog.netbsd.org/tnf/entry/support_for_microsoft_emips...

The thing is, this is not just another step up in complexity as another poster wrote here, but several.

Because it requires partial dynamic reconfiguration, which works with ram based FPGAs only (the ones which load their bitstream(containing their initial configuration) on startup from somewhere), not flash based ones which are "instant on" in their fixed configuration.

Regardless of that, partial dynamic reconfiguration takes time. The larger the reconfigured parts, the more time.

This is all very annoying because of vendor lock in because of proprietary tools, IP-protection, and so much more.

The few fpgas which have open source tool chains are unsuitable because they are all flash based AFAIK, and it doesn't seem to be on the radar of the people involved in developing these, because why, if flash anyways?


> The few fpgas which have open source tool chains are unsuitable because they are all flash based AFAIK...

Not true at all. The flagship open-source FPGAs are the Lattice iCE40 series, which are SRAM-based. There's also been significant work towards open-source toolchains for Xilinx FPGAs, which are also SRAM-based.

The real limitation is in capabilities. The iCE40 series is composed of relatively small FPGAs which wouldn't be particularly useful for this type of application.


Lattice ECP5 is an SRAM-based FPGA which has up to 84K LUTs (vs ~5K for iCE40) and is supported by an open source tool chain. E.g. see https://https://www.crowdsupply.com/radiona/ulx3s.


OK? I didn't follow the efforts for Lattice because insufficient resources for my needs. I'm aware of efforts for Xilinx, but they aren't covering the SKUs/models I'm working with. Is there anything for Altera/Intel now?


I'm not aware of any significant reverse-engineering efforts for Intel FPGAs. QUIP [1] might be an interesting starting point, but there may be significant IP licensing restrictions surrounding that data.

Out of curiosity, which Xilinx models are you hoping to see support for?

[1]: https://www.intel.com/content/www/us/en/programmable/support...


Here is a project to reverse engineer the Xilinx series 7 FPGAs to be able to target them with open source tools:

https://github.com/SymbiFlow/prjxray


The challenge is that reformulating problems to parallel computation steps is something we're in general still really bad at.

We're struggling with taking full advantage of GPUs and many-core CPUs as it is.

FPGAs is one step up in complexity.

I'd expect JIT'ing to FPGA acceleration to show up other than as very limited research prototypes after people have first done a lot more research on JIT'ed auto-parallelisation to multiple CPU cores or GPUs.


The "execution model" is so vastly different it's hard to even know what "JIT-compile hot loops from machine instructions to hardware" even means. I wouldn't even call HDLs "execution" - they describe how to interconnect electronic circuits, and if they can be said to "execute" at all, it's that everything runs in parallel, processing signals across all circuits to the beat of a clock (usually, not always).


You might be interested in this work which integrates a programmable fabric directly with a MIPS core in order to speed up inner loops: http://brass.cs.berkeley.edu/garp.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: