Did anyone else misread that headline as compostable computing systems and wonder if there's been a breakthrough in biodegradable systems to lower environmental impact?
Beer icon because I probably need one...
Liqid has added FPGAs to the list of compute resources its customers can use to compose workload-specific computing systems. An FPGA (field-programmable gate array) is a flexible compute chip composed of logic blocks, designed to be programmed or configured "in the field" using a hardware description language. They are more …
If you need a computer for gaming, you want it to have a GPU because it is much better at calculating graphics stuff than any general-purpose CPU. On the other hand, it's only so good at that sort of task - you may also use it to eg. mine cryptocurrency but moving away to dissimilar but equally specialized task like, say, implementing a neural network you soon find you'd do much better using a different bit of specialized kit.
On the other hand, if your computer incorporates freely configurable silicon like an FPGA, you may not quite equal the performance of a dedicated piece of comparable hardware, but you gain the freedom to accelerate any kind of task a dedicated hardware accelerator could be built for - you just build your custom accelerator from the hardware resources offered by the FPGA, using nothing but software. You get a machine that will probably not quite equal a dedicated GPU or crypto-mining ASIC or neural net, but can do any of those much faster than just a CPU, choosing between any of those tasks at will.
I'm just getting into FPGAs and they're a fun way of "coding" something very different. If you imagine writing C where every line of code runs at the same time then you're getting some of the way there (although this is of course a huge oversimplification).
Anything PCIe connected seems to be ridiculously priced. Actually, I suppose as it's aimed at enterprise customers it's not really. I've just got my head in the sub-£100 hobbyist end.
This is part of the problem when it comes to deploying FPGAs - the superficial similarity to software leads the naive write code that is logically structured in a manner equivlaent to conventional computer programs. The problem is that what is a natural fit for software - essentially a sequence of steps - doesn't fit very well when expressed as hardware, and you inevitably lose the parallelism advanatges that make FPGAs attractive in the first place.
The best programmers making the switch tend to be those with good backgrounds in functional programming, they are much more used to the mindset of "This is a relationship that always holds" rather than "do this, then do this, now do that..." Unfortunately FP isn't particularly fashionable and the talent pool is limited.
"It is quite a leap, isn't it?"
Just remember you _are_ describing hardware, every flop flop, look up table etc you describe is always there and clocked, every clock.
And the article description is slightly incorrect, FPGAs aren't 'flexible compute devices'. They can be, but they could also just act as wire connections from pin to pin.
I'm somewhat proficient in VHDL and I've done a bit of functional programming as well. The issue is that when something generally is a series of instructions, it's often uncomfortable and simply backwards to describe things in terms of state.
I've told people before that a great starting point for learning to do VHDL is to write a parser using a language grammar tool. It's one of the simplest forms of functional programming to learn.
Another thing to realize is that the backwards fashion in which most HDLs are written is extra difficult since something in terms of "Hello World" is a nightmare as there's a LOT of setup to do to produce even a basic synthesized entity. Hell, for that fact, simply the setup work for an entity itself is intimidating if you don't already understand implicitly what that means.
There's been a lot of work into things like System-C and System-Verilog to make this all a little easier, but it's still a HUGE leap.
Now, OpenCL has proven to be a great solution for a lot of people. While the code generated by OpenCL for the purpose is generally horrible at best, it does lower the entry level a great deal for programmers.
Consider a card like this one which Liquid is pushing.
You need to take a data set, load it into memory in a way that makes it available to the FPGA (whether internally or over the PCIe bus), then you need to make easily parallelizable code which can be provided as source to the engine which compiles it and uploads it to the card. Of course, the complexity of the compilation phase is substantially higher than uploading to a GPU, so the processing time can be very long. Then the code is loaded on the card and executed and the resulting data needs to be transferred back to main memory.
There are A LOT of programmers who wouldn't have the first idea where to start with this. There's always cut and paste, but it can be extremely difficult to learn to write OpenCL code that would take less time to compile (synthesize), upload and run that couldn't have just been run faster on CPU.
Then there's things like memory alignment. Programmers who understand memory alignment on x86 CPUs (and there's far fewer of those than there should be) can find themselves lost when considering that RAM within an FPGA is addressed entirely differently. Heck, RAM within the FPGA might have 5 or more entirely different patterns of how it's accessed. Consider that most programmers (except for people like those on the x264 project) rarely consider how their code interacts with L1, L2, L3 and L4 cache. They simply spray and pray. Processor affinity is almost never a consideration. We probably wouldn't even need most supercomputers if scientific programmers understood how to distribute their data sets across memory properly.
I've increased calculation performance on data sets more than 10,000 fold within a few hours just by aligning memory and distributing the data set so that key coefficients would always reside within L1 or worst case, L2 cache.
I've increased code even more simply by choosing the proper non-arbitrary scale matrix multiplication function for the job. It's fascinating how many developers simply multiply a matrix against another matrix with a complete disregard for how a matrix is calculated. I actually one time saw a 50,000x performance improvement by refactoring the math of a relatively simple formula from a 3x4 to a 4x4 matrix and moving it from an arbitrary math library to a game programmers library. The company who I did it for was amazed because they had been renting GPU time to run Matlab in the cloud and by simply making code which could be optimized properly by the compiler... a total of Google->Copy & Paste->Compile->Link the company saved tens of thousands of dollars.
When I see things like the latest two entries into the super-computer Top500, all I can think of is that the code running on there almost certainly could be optimized to distribute via Docker into a Kubernetes cluster, the data sets can be logically distributed for Map/Reduce and instead of buying a hundred million dollars of computer or renting time on it, the same simulations could be performed for a few hundred bucks in the cloud.
Hell, if the data set were properly optimized for map/reduce instead of using some insane massive shared memory monster, it probably would run on a used servers in a rack. I bought a 128-Core Cisco UCS cluster with 1.5TB of RAM for under $15,000. It doesn't even have GPUs and for a rough comparison, when I tested using crypto-currency mining as a POC, it was out-performing $15,000 worth of NVidia graphics cards... of course, the power cost was MUCH higher, but it wasn't meant to test feasibility of crypto mining, it was just a means of testing highly optimized code on different platforms. And frankly, scrypt is a pretty good test.
I'll tell you ... FP is lovely.. if you can bend to it. F# is very nice and Haskell is pretty nice as well. Some purists will swear by LISP or Scheme, and there's the crazies in the Ericsson camp.
The issue with FP isn't whether it's good or easy or not. It's the same problem you'll encounter with HDLs, the code written in it is generally written by very mathematical minds that think in terms of state and it makes it utterly unreadable.
I'd like to tinker with these but it looks like a few hundred before you get anything worthwhile, and that has to interface with a bunch of stuff to do anything useful or you have to spend an inordinate amount of time adjusting.
For me, there are a few things in that kind of range: FGPA, SDR and CUDA.
I'd be after a "microbit" of FGPA, if that kind of thing exists. A teeny, tiny version that runs off a USB stick and which you can stick GPIOs etc. on.
https://tinyfpga.com/ looks promising but it'll have to wait for my Christmas funds as likely it won't just be as simple as clicking Buy Now, getting it, plugging it in, and starting to code it up.
Likely by the time I tinker and get there, things like FGPAs will be an inherent part of every processor / motherboard in some fashion anyway. A few people already make RPi boards for them.
There are a few cheap options for FPGA. The Lattice iCEstick combined with the open source IceStorm tool-chain seems popular amongst hobbyists. Not tried it myself though.
Personally, I've gone with the MiniZed Zynq SoC board (ARM A9 + FPGA) alongside an Arty S7 that someone kindly gave me. I'm also planning to play at the very low end with Xilinx CPLDs for when simple and easily solderable (i.e. QFP not BGA) are more important than power.
If it helps, this is the advice I got when I asked how to get into programmable logic.
Biting the hand that feeds IT © 1998–2019