My results are pretty consistent. I run a loop from 1 to 10000000 so
the completion times are in seconds, which I think helps eliminates
variations due to other things that might be going on in the machine.
Somehow, this result doesn't surprise me. There doesn't seem to be much difference when the C function actually starts doing something. anyway. It's only expensive to use the Lua C API when working with LuaJIT, and then only relatively - there the strategy is to use the compiled FFI interface.
Of course, different situation when interacting with a managed system like the JVM or the CLR, where the script/native interface adds significant overhead, especially with reflection factored in.