This clarifies a bit the dispute about AMD's settlement about the core count lawsuit. The one FPU per cluster of two cores is what I've always thought decided the case, and observations like these continue to point out that the two individual integer cores didn't share enough to impede either's resources:

> When running two threads in the module, Bulldozer can provide twice as much > L2 bandwidth. That suggests the L2 has separate paths to each thread’s > load/store unit, and lines up with AMD’s publications.

I never knew that the 32 nm process had so many issues. They certainly were never as bad as the 90 nm issues, but perhaps after all the noise about 90 nm, problems with 32 nm just weren't worth writing about in the news.

I wonder if my specific needs just matched the Bulldozer CPU particularly well, because I didn't see any significant advantages in Sandy Bridge over Bulldozer. Then again, I was running four concurrent but separate threads of -J 2 compiles simultaneously, which may've been a particularly good use case for Bulldozer.

I think I'll try out some new benchmarks using a modern OS and toolchain on both Sandy Bridge and Bulldozer to see how things have changed since 2011 / 2012.

̶g̶o̶o̶d̶ ̶l̶i̶n̶k̶ ̶a̶n̶d̶ ̶r̶e̶a̶d̶ ̶b̶u̶t̶ ̶d̶u̶p̶

My 2 cents,

I'd like to see Zen 2 merged with some CMT ingenuity for their low power high efficiency branch of cores. Introduce some clustering on one of these Zen 2 CCXs, and arrange this cluster with SMT4 on the FPU side while keeping SMT2 on the integer side. This would be an analogue to the Piledriver family's SMT1/SMT2 int/fpu scheme.