Most of the overhead is found in dealing with variable length input, minus signs, and overflow checking.
We had a similar case with some AVX-optimized string operation in glibc.
I don't know this is faster than simply doing the "mutiply by ten and add" sequence. I'd have to understand modern ALU behaviour, parallelism, shortcuts, chip cache dynamics you-name-it.
It would all have been so much simpler if we'd made decimal be hex instead. Octal doesn't work as well for me, no idea why. That said, it's much easier to ignore thumbs and count on 4 fingers of each hand than grow another 6
(I read the article. It descended into arcanum of instruction sets and assumptions which post date my DEC-10 ISA lessons, although I recall Digital had BCD instruction handling and a 6 bit byte model in a 36 bit word accordingly. DEC-10 instructions could take many, many clock cycles and have 5 components of from, to, via, because, maybe attached to them)
First, that's a strong assumption. But...
> We check whether some value exceeds 9, in which case we had a non-digit character.
Huh? You already know where the end of the digits is. At least, I thought you did? So why would this be necessary?