Plus, if a library isn't in shared memory, it's less likely to be in L1/2/3 cache after a context switch. So in addition to wasting memory, it could also be slower even if none of the code is paged out. But I have no idea if that is significant in real-world software. Probably not. (OTOH, Intel, AMD, and ARM have gone to considerable lengths to preserve CPU caches across context switches. And all three recently added tagged TLB tables, so virtual address remapping is minimized.)
It’s very significant, at least on x86. Switching threads is very expensive not because of the context switch per-se (which is pretty fast), but the huge overhead with warming the various levels of caches .. it takes a significant amount of time for these to reach a reasonable hit rate after a switch. And yes, TLB tagging and large pages can help, but at the end of the day the culprit is of course our old “friend” .. software bloat. One reason Lua is so fast for a bytecode VM is that the entire thing can fit inside a modern L1/L2 cache, which means you are getting close to a microcoded machine given the aggressive way x86 processors work these days.
—Tim
|