I have a twofold problem concerning memory computation and sparse matrices for 3D arrays in Python:
I have various sparse representations (of 1s and 0s) of 3D numpy arrays. Meaning an array looks like this:
A =
[[[0. 1. 1.]
[1. 0. 1.]
[1. 0. 0.]
...
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]]
...
[[1. 0. 1.]
[0. 1. 0.]
[1. 0. 1.]
...
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]]
and array.shape is equal to (x, y, 3).
I would like to find a way to (1) measure the array's memory, then (2) store it as a sparse matrix/array (using sth similar to scipy's csr_matrix), then (3) measure the memory of the sparse matrix/array to (hopefully) see an improvement in memory.
My first problem is that I generally have trouble with python's memory measurement solutions I have found so far, for example, I expected to see a difference in memory size when I take an array of floats of many decimal points (eg B = [[[0.38431373 0.4745098 0.6784314 ] [0.41963135 0.49019608 0.69411767] [0.40392157 0.49019608 0.6862745 ] ...]]]) and an array of 1.s & 0.s of the same size (like array A) which should have shown a big improvement (I need to measure this difference as well). Yet, python reports the same memory size for arrays of the same shape. I am listing the methods I used here and their outputs:
print(sizeof(A)) #prints 3783008
asizeof.asizeof(A) #prints 3783024
print(actualsize(A)) #prints 3783008
print(A.nbytes) #prints 3782880
print(total_size(A)) #prints 3783008
getsize(A) #prints 3783008
print(len(pickle.dumps(A))) #prints 3783042
********************
print(asizeof.asizeof(B)) #prints 5044112
sys.getsizeof(B) #prints 128 !!!
print(sizeof(B)) #prints 128 !!!
print(actualsize(B)) #prints 128 !!!
print(total_size(B)) #prints 128 !!!
print(B.nbytes) #prints 3782880
getsize(B) #prints 128 !!!
print(len(pickle.dumps(B))) #prints 3783042
(methods collected from here, here, and here).
My second problem is that I cannot find an economical way to store a matrix (of a certain sparsity) as a sparse matrix for 3D arrays: Scipy's csr_matrix and pandas' SparseArray works for 2D arrays only, and sparse.COO() is very costly for 3D - it starts to help with memory for sparsities of ~80% and higher. For example, a 70% sparse array stored with sparse.COO() is about 8M bytes big (e.g. using pickle), which is much bigger than the actual array. Or maybe the problem is still the way I compute memory (see methods listed in the examples above).
Any ideas of what I should do? I am really sorry this post is too long! Thank you in advance!
Baviewof another array?nbytesis normally enough, and is just the product of dimensions and dtype size.dtype), plus a coordinate values (usually stored asint64).scipy.sparseuses the 3 arrays to store these values - data, rows, and columns. For a 3d version you won't get any space savings untilnnzis less than 25%.getsizeofhelp when dealing with objects that contain references to other objects,, such as list and dict. For numpy arrays they aren't needed. The main memory use is the data buffer, which stores all elements as bytes (normally 8 bytes per element). Sonbytesis just the total number of elements times 8. Obviously you need to be aware of whether the array is aviewor not. Similarly for sparse matrices, you need to understand how the data is stored.plt.imshow()only works with [0. 1.] or [0 255] values. (an array of 1s and 0s would print a black image).