

Reads from CCD to IO are still 2X write, to conserve die area and transistor budget. On the other hand, AMD reused the chiplet design, with one or two CCDs (fabricated on 7 nm) paired with a 12 nm IOD (I/O Die). This unified CCD design completely eliminates CCX-to-CCX communication, greatly improving core-to-core latency. Zen 3 uses a unified complex, in which each CCD now contains a single CCX with a unified 32 MB 元 cache. In Zen 2, each CCD (Compute Die) is made up of two CCX (core complexes), each with a 16 MB 元 cache. In addition to micro architectural improvements, Zen 3 (Vermeer) also features SoC design changes. Rearchitect core/cache communication into a ring system.Unify all 元 cache in a CCD into a single contiguous element of up to 32 MB.Unify all cores in a CCD into a single unified complex consisting of 4, 6, or 8 contiguous cores.Reduce dependency on main memory accesses, reduce core-to-core latency, reduce core-to-cache latency.More flexibility in load/store operations.


Overall higher bandwidth to feed the appetite of the larger/faster execution resources.Larger structures and better prefetching to support the enhanced execution engine bandwidth.Floating point FMAC is now 1 cycle faster.Floating point has increased bandwidth by +2 for a total of 6-wide dispatch and issue.Reduced latency for select float and int operations.New dedicated branch and st-data pickers for integer, now at 10 issues per cycle (+3 vs.Reduce latency and enlarge structures to extract higher instruction-level parallelism (ILP).

#ZEN 3 ARCHITECTURE CODE#
“No bubble” prediction capabilities to make back-to-back predictions more quickly and better handle branchy code.L1 branch target buffer doubled in size to 1024 entries for better prediction latency.Faster fetching, especially for branchy and large-footprint code.Having a chip that matches Zen 3 in IPC will bring them close to the performance level of modern chips. This could mean that we see Chinese chips targeting AMD Ryzen and EPYC CPUs based on the Zen 3 core architecture by next year. The performance comparison shows a 3A5000 4 core CPU with 2.5 GHz clock speed compared to a Core i7-10700 'Comet Lake' CPU with 8 cores and 2.9 GHz clock speed.īecause the chips are made locally, the prices that these chips carry will be very economical for use in China's educational and tech centers.
#ZEN 3 ARCHITECTURE SERIES#
Loongson also posted the simulated performance of their next-gen 6000 seriesĬPUs which offer up to 30 per cent higher fixed & 60 per cent higher floating-point performance versus the existing 5000 series chips. The Loongson 3A5000 is competitive in the single-core performance workloads vs a range of ARM chips (7nm) and even an Intel Core i7-10700. Meanwhile, Loongson also unveiled that they plan to launch their next-generation 6000 series chips which will offer a brand new microarchitecture and offer IPC on par with AMD's Zen 3 CPUs. So essentially, a multi-chiplet solution. Loongson also plans to release a 32-core variant based on the same architecture known as the 3D5000 and it will feature two 3C5000 dies on the same package. Loongson Technology's new processor also operates with four HyperTransport 3.0 SMP controllers that "allow multiple 3A5000s to operate in unison inside a single system. The 3A5000 uses the Chinese in-house 64-bit GS464V microarchitecture, featuring support for DDR4-3200 dual-channel memory, a primary encryption module, two 256-bit vector units per core, and four arithmetic logic units.
