Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only think I am happy about all this AI hype is Infiniband is getting some love again. A lot of people using RoCE on Connect-X HBAs but still a lot of folk doing native IB. If HPC becomes more commonplace maybe we get better subnet managers, IB routing, i.e all the stuff we were promised ~10+ years ago that never had a chance to materialise because HPC became so niche and the machines had different availability etc requirements than OLTP systems that didn't demand that stuff getting built out. Especially the subnet managers as most HPC cluster just compute a static torus or clos-tree topology.

There was a time I was running QDR Infiniband (40G) at home while everyone else was still dreaming of 10G at home because the adapters and switches were so expensive.



QDR infiniband at home was such a great hack, and I was doing the same thing with my small network a little while ago. As long as you run connections point-to-point (with no switch) and keep the computers near each other (in twinax range), it's incredibly cheap and blazing fast.


Yeah I ran it for years with no issues. I modified a Mellanox switch to use quieter fans and eventually switched to fibre SFPs so I could make longer cable runs (also twinax is bulky/heavy). Was absolutely epic to use my ZFS NAS at essentially native speeds from my desktop and media PC. I was using the IB SRP target to export ZVOLs so with fast all-flash storage you really couldn't tell it wasn't running on a (giant) local SSD.


> The only think I am happy about all this AI hype is Infiniband is getting some love again.

Any particular reason? I know latency is lower, which is good for MPI and such, as is RDMA.

But for general use, why should folks care about IB instead of Ethernet?


It's not for general use but if you build large infrastructure stuff IB can be a very interesting transport choice. Naturally you will only do this if you want to either use RDMA directly (via IB Verbs) or use protocols like IB SCSI RDMA Protocol which do so under the hood.

Things like "Datacentre Ethernet" and RDMA over Converged Ethernet (RoCE) are basically Infiniband anyway (you will find that such features are only available on Connect-X HBAs from Mellanox which can generally run in 100GE or IB mode).

The main reason these things matter is they are circuit switched and generally combined with network topologies that ensure full bisectional bandwidth (or something that is essentially good enough for the communication pattern the machine is designed for). This means the network becomes "lossless" once it's properly verified as online and circuits are setup. For storage this is essentially unmatched, it's equivalent to Fiber Channel but way faster and converged so you can also run your workload over the same links rather than having separate storage and network links.


IB is still so expensive. The new Sprwctrum Ethernet switches are a better choice IMO.


I haven't kept up with the pricing for the last decade, back when I was working with it was a fraction of the cost of comparable Ethernet. Maybe things have changed drastically though.


> Maybe things have changed drastically though.

On the used market I'm sure you can get some decent kit for a homelab. But if you want a production environment, with anything close to the latest speeds (≥100G), you can't get gear for love or money: all the Big Players are taking all the inventory, so lead times are slightly ridiculous (assuming anyone has stock).

We were looking at IB in Feb 2022, and the earliest for IB switches was Nov-Dec 2022. We ended up going a different route for other reasons, so I haven't kept on lead times, but it would not be surprising if things were roughly the same.


> IB is still so expensive.

Also hard to get from vendors: wait lists for >100G switches is a long time. No one has them in stock: probably because LLM is what all the cool kids are doing, so all inventory/production has been spoken for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: