Roughly 4x the bandwidth of the CPU-to-chipset link seen on previous Threadripper X399 platforms and 4x that of Intel’s X299 platform speaks volumes about the TRX40 platform’s heavy IO capacity. That’s important, as even a single PCIe 3.0 x4 SSD can more-or-less saturate the Intel X299 DMI or AMD X399 4-lane chipset link on its own.
Also worth noting is that a Gen 4 x8 link is double the bandwidth of the X570 platform as that also uses PCIe Gen 4.0 but only at 4 lanes width.
The new connection is a long overdue move, especially for the HEDT platform. It now means that high bandwidth devices and heavy RAID arrays can reasonably be hung off the chipset without being subject to a severe bottleneck when communicating with the CPU. A really smart move, in our opinion. Kudos, AMD.
We already mentioned the high-bandwidth link between the CPU and chipset, but the chipset itself is also different compared to X399. Built on the Global Foundries 14nm process, what we see from TRX40 is a set of features that is very similar to X570.
You get 8 dedicated USB 3.2 Gen 2 10Gbps ports, 4 dedicated USB 2.0 ports for legacy purposes, and 4 dedicated SATA 6Gbps ports. In addition, you get access to 8 general purpose PCI Gen 4 lanes, and two sets of PCIe 4.0 4-lane links (and the associated bifurcations) or four SATA 6Gbps ports.
Add in the eight-lane PCIe Gen 4 uplink to the Ryzen Threadripper CPU, in addition to the quartet of on-chip USB 3.2 Gen 2 10Gbps ports, and it is clear to see why TRX40 is such a connectivity-heavy platform.
Worth pointing out is that the TRX40 chipset can be power hungry, just like X570. Built on Global Foundries 14nm technology and with a 15W peak TDP according to AMD, we will see most motherboard vendors using active fan cooling for the chipset, as we see with X570.
This is perhaps more important for TRX40 than X570, given the prosumer/workstation platform’s tendency to run IO heavy, sustained workloads for extended periods of time.
Clearly, the Zen 2 architecture and 7nm TSMC technology for the cores are what has given AMD such a strong market potential with Ryzen 3000 and EPYC Rome up to this point. We already know most of the details there, though, so I want to spend a little more time discussing the IO die for Threadripper and its significance.
This is likely to be the single component that holds the key to unlocking the performance of these high-core-count Threadripper chips when we saw last year’s Threadripper 2000 WX processors seriously struggle in many use cases.
Ryzen Threadripper in previous generations was, more or less, a cut down EPYC server CPU with certain features disabled, such as additional memory controllers or PCIe lane links. This created NUMA and latency headaches for cores trying to access memory that was off die.
This time round, however, the design for Zen 2-based processors is different due to the use of a central IO die that is physically separate from the core chiplets and connects to them via Infinity Fabric. Being segregated, NUMA and PCIe access penalties should not enter the equation anywhere near as significantly as they did with previous generation Threadripper WX parts.
The IO die for Ryzen Threadripper’s 3960X and 3970X comes in at 416mm2 with around 8.34 billion transistors built on Global Foundries’ 12nm process. That’s massive. Absolutely massive, at approximately 5.6 times the size of the 74mm2 7nm TSMC core chiplets. For perspective, a GTX 1080 is 314mm2 built on TSMC’s 16nm process, while the newer RTX 2060 is 445mm2 with TSMC 12nm lithography.
Despite the physical dimensions looking identical, Threadripper does not deploy the same set of die features that is used for EPYC. The key difference is the reduction in memory channels and PCIe lanes. Ryzen Threadripper’s IO die uses a pair of two-channel DDR4 memory blocks to provide quad-channel capability. This is a reduction versus EPYC Rome’s eight-channel DDR4 support.
Equally so, the 64-lane PCIe Gen 4 capacity is provided by a pair of 32-lane blocks – 64 lanes less than what EPYC Rome gets in its primary blocks (excluding EPYC’s ‘bonus’ links). With Threadripper 3000 in its 4-CCD 3960X and 3970X form, four Infinity Fabric interconnect blocks link each individual core chiplet with the central IO die.
The IO die for Ryzen Threadripper 3000 is clearly a large, complex, and not inexpensive piece of silicon. However, it may also prove to be the stroke of genius for these higher core count parts that enables them to avoid NUMA and accessibility latency penalties that crippled performance, especially in certain Windows workloads, on previous Threadripper 2000 WX offerings.
We also get an interesting insight into the future flagship for the TRX40 platform – the Threadripper 3990X 64-core part that will require eight fully populated CCDs.