lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


The following code is hereby released into the public domain (although I'm never adverse to getting credit):

static inline lua_Number imod_aux(lua_Number n, lua_UNumber absm) {
  lua_UNumber mask;
  lua_Number res;
  mask = absm - 1;
  if ((absm & (absm ^ mask)) == absm) {
    res = n & mask;
    if (absm == 0) overflow_error();
  }
  else if (n < 0)
    res = mask - (lua_UNumber)(~n) % absm;
  else
    res = (lua_UNumber)n % absm;
  return res;
}

lua_Number imod(lua_Number n, lua_Number m) {
  if (m < 0) {
    lua_Number r = imod_aux(n, (lua_UNumber)(-m));
    return r ? m + r : r;
  }
  else
    return imod_aux(n, (lua_UNumber)m);
}

------------

This has the same behaviour with respect to signs as the work6 implementation of %; specifically, p % q is either in the range [q+1, 0] or [0, q-1] depending on whether q is negative or positive. I'm not standing up for this particular definition, by the way, just trying to be consistent.

The code has been carefully optimised for the case of p % 2^k, but a check against an implementation based on the Python implementation (which uses / and *) reveals that it is still faster on both x86 and ppc32 architectures, significantly faster in the former case.

Some explanation is possibly in order:

m xor m-1 effectively turns all trailing 0's in the binary representation of m into 1's, and turns all bits to the left of the rightmost 1 into 0's. Consequently, m bitand (m xor m-1) is exactly equal to m iff m is a power of 2. If m is a power of 2, then n mod m can be computed by bitand'ing n with m-1.

The ~n in line 10 is simply based on: ~n === -(n + 1). The version of gcc I used didn't find this optimisation, for some reason. The point of line 10 was to avoid an extra test for a zero result; I couldn't come up with a workable formulation of the same optimisation in line 18, and the result is evident from timing tests.

I'm not sure that the use of unsigned % is actually necessary; in fact, both arguments to the % operator in both lines 10 and 12 are guaranteed to be positive. However, I had at the back of my mind the thought that unsigned mod might produce better compiled output on some architectures. That might be completely wrong, though.