Well, I looked into this deeper after everyone gave me the conventional wisdom to NOT depend on `lua_next`'s iteration order and pray for no breaking changes.
It turns out that the correct wisdom -- iterate using an index value -- seems to also be the most performant wisdom from the C++ or C side. [1] compares the performance of calling `lua_next()` in a while loop and assuming it iterates in the right order, pushing a table and calling `lua_geti` with the indices (and knowing the size and indices before hand), and the other measures the doing a lua_pushinteger + lua_gettable. Measured in a computer with an i7 Intel chip, lots of ram, Windows 10, Lua 5.3.3 compiled as a DLL.
Interestingly, the pushinteger + gettable method is about equal to the lua_next implementation. So, for Lua 5.2 and lower, since there's no lua_geti intrinsically built into the API, the correct way is equal in performance to the UB way.
lua_geti was introduced in Lua 5.3 I believe, and it correctly adds a very heavy performance gain for the lookup (sorry LuaJIT and Lua 5.2 users, you're not gonna get the benefits here).