GPT-3's failure at larger addition sizes is almost fully due to BPE, which is incredibly pathological (392 is a βdigitβ, 393 is not; GPT-3 is also never told about the BPE scheme). When using commas, GPT-3 does OK at larger sizes. Not perfect, but certainly better than should be expected of it, given how bad BPEs are.
(Replying to PARENT post)
http://gptprompts.wikidot.com/logic:math