4

I want to perform memory and time efficient operations between numpy arrays containing only -1 and 1.

Example:

>>> m1
array([[-1,  1,  1, -1],
       [ 1, -1, -1, -1]], dtype=int8)
>>> m2
array([-1,  1, -1,  1], dtype=int8)
>>> np.dot(m1, m2)
array([ 0, -2], dtype=int8)

In this case, the binary variables are represented by np.int8 type variables, which is already a bit better than np.int32.

Is there a way to represents this values by one bit sized variables and perform fast operations? I need to do dot products and additions.

4
  • 2
    The elementwise multiplication of such vectors (as part of dot-product operation) can be represented as xor operation on bits or bytes. For summing the -1 and 1 I would recommend to build a lookup table for every possible byte value. Lookup values for each byte of the vector, then sum up the values. Only partly filled bytes need special care (or more lookup tables). Commented Nov 16, 2019 at 23:09
  • Related - softwareengineering.stackexchange.com/q/185104 Commented Nov 17, 2019 at 2:50
  • What exactly do you mean by additions? Just sum reductions like in a dot product? Commented Nov 17, 2019 at 4:49
  • Yes I mean sum. It is usually elemnt-wise sums between 2 arrays of the same size. Then I apply the sign function on the result. So, the sign of the sum is wanted, not the result of the sum itself. Commented Nov 18, 2019 at 5:18

1 Answer 1

1

There's the BitVector library that allows you to pack the bits denser with a nice API, although different than numpy. It may be easier to use it than implement the operations on top of numpy. Although it should be possible via the bitwise XOR.

If your -1 is represented as 0 in the bit vector and 1 as 1, then the dot product is the number of positions that have the same value in both vectors minus number of positions where the value differs.

You get the differing positions by XOR:

>>> m1_1 = BitVector(bitstring='0110')
>>> m2 = BitVector(bitstring='0101')
>>> xor = m1_1 ^ m2
>>> print(xor)
0011

You then sum this vector but you must take into account that zeros here mean 1 and ones mean -1. Therefore we'll subtract the number of bits set to one from the number of bits set to zero:

>>> bits_zero = xor.length() - xor.count_bits()
>>> bits_zero - xor.count_bits()
0
>>> # Or just
>>> xor.length() - 2 * xor.count_bits()
0

It should be straightforward to port this approach to Numpy if your vector sizes align with the integer sizes (i. e. a multiple of 32, 64). Otherwise, you'll have to treat the last int specially.

Edit: As @Michael Butscher writes in the comment, you'll miss the function count_bits in Numpy. Going byte by byte, the lookup table is indeed small and efficient.

Note: although this should be more memory efficient, it's not certain if it brings any speedup. You must do your benchmarks.

Edit: Benchmark results

I've just timed the pure numpy (storing bit in int) vs. the BitVector.

vector_len = 64*1024
matrix_rows = 1024
# I tested various data types
dtype = np.int8

m1 = 2 * np.random.randint(2, dtype=dtype, size=[matrix_rows, vector_len]) - 1
m2 = 2 * np.random.randint(2, dtype=dtype, size=vector_len) - 1

# this is being timed:
dot = m1.dot(m2)

m1_bv = [BitVector(bitlist = (row + 1) / 2) for row in m1]
m2_bv = BitVector(bitlist = (m2 + 1) / 2)

# this is being timed:
dot_bv = [vector_len - 2 * (m1_row ^ m2_bv).count_bits() for m1_row in m1_bv]

The results are (64-bit Intel laptop processor):

  • BitVector: horrific - 1min 7s ± 1.51 s
  • int8: 192 ms ± 3.39 ms
  • int16: 192 ms ± 1.6 ms
  • int32: 40.6 ms ± 228 µs
  • int64: 48.6 ms ± 307

I didn't get yet to implementing the bitwise dot product with Numpy but you can see that

  1. BitVector is not really viable here.
  2. Small integer types save memory but they make it harder for Numpy to optimize the computations.
Sign up to request clarification or add additional context in comments.

1 Comment

Given the 64 bit orientation of modern processors, you might not gain much speed by doing bit level processing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.