Memory computation and sparse matrices for 3D arrays in Python

Ask Question

Asked 3 years, 6 months ago

Modified 3 years, 6 months ago

Viewed 152 times

I have a twofold problem concerning memory computation and sparse matrices for 3D arrays in Python:

I have various sparse representations (of 1s and 0s) of 3D numpy arrays. Meaning an array looks like this:

A =

[[[0. 1. 1.]
  [1. 0. 1.]
  [1. 0. 0.]
  ...
  [1. 0. 0.]
  [1. 0. 0.]
  [1. 0. 0.]]

 ...

 [[1. 0. 1.]
  [0. 1. 0.]
  [1. 0. 1.]
  ...
  [0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]

and array.shape is equal to (x, y, 3).

I would like to find a way to (1) measure the array's memory, then (2) store it as a sparse matrix/array (using sth similar to scipy's csr_matrix), then (3) measure the memory of the sparse matrix/array to (hopefully) see an improvement in memory.

My first problem is that I generally have trouble with python's memory measurement solutions I have found so far, for example, I expected to see a difference in memory size when I take an array of floats of many decimal points (eg B = [[[0.38431373 0.4745098 0.6784314 ] [0.41963135 0.49019608 0.69411767] [0.40392157 0.49019608 0.6862745 ] ...]]]) and an array of 1.s & 0.s of the same size (like array A) which should have shown a big improvement (I need to measure this difference as well). Yet, python reports the same memory size for arrays of the same shape. I am listing the methods I used here and their outputs:

print(sizeof(A))   #prints 3783008

asizeof.asizeof(A)   #prints 3783024

print(actualsize(A))  #prints 3783008

print(A.nbytes)  #prints 3782880

print(total_size(A))  #prints 3783008

getsize(A)  #prints 3783008

print(len(pickle.dumps(A)))  #prints 3783042

********************

print(asizeof.asizeof(B)) #prints 5044112

sys.getsizeof(B) #prints 128 !!!

print(sizeof(B))  #prints 128 !!!

print(actualsize(B))  #prints 128 !!!

print(total_size(B))   #prints 128 !!!

print(B.nbytes)  #prints 3782880 

getsize(B)  #prints 128 !!!

print(len(pickle.dumps(B)))  #prints 3783042

(methods collected from here, here, and here).

My second problem is that I cannot find an economical way to store a matrix (of a certain sparsity) as a sparse matrix for 3D arrays: Scipy's csr_matrix and pandas' SparseArray works for 2D arrays only, and sparse.COO() is very costly for 3D - it starts to help with memory for sparsities of ~80% and higher. For example, a 70% sparse array stored with sparse.COO() is about 8M bytes big (e.g. using pickle), which is much bigger than the actual array. Or maybe the problem is still the way I compute memory (see methods listed in the examples above).

Any ideas of what I should do? I am really sorry this post is too long! Thank you in advance!

edited May 24, 2022 at 16:29

asked May 24, 2022 at 10:19

Amadeo Amadei

215 bronze badges

Why would you expect an array of floats to be different sizes based on what the values of the floats are? That's not how floats work - if you really want, you can pack arrays of 1s and 0s to be single bit values though.

CJR
– CJR

2022-05-24 14:07:14 +00:00
Commented May 24, 2022 at 14:07
Is your B a view of another array? nbytes is normally enough, and is just the product of dimensions and dtype size.

hpaulj
– hpaulj

2022-05-24 14:36:37 +00:00
Commented May 24, 2022 at 14:36
For COO sparse formats, each nonzero element requires its data (determined by dtype), plus a coordinate values (usually stored as int64). scipy.sparse uses the 3 arrays to store these values - data, rows, and columns. For a 3d version you won't get any space savings until nnz is less than 25%.

hpaulj
– hpaulj

2022-05-24 15:30:58 +00:00
Commented May 24, 2022 at 15:30
Those variants on getsizeof help when dealing with objects that contain references to other objects,, such as list and dict. For numpy arrays they aren't needed. The main memory use is the data buffer, which stores all elements as bytes (normally 8 bytes per element). So nbytes is just the total number of elements times 8. Obviously you need to be aware of whether the array is a view or not. Similarly for sparse matrices, you need to understand how the data is stored.

hpaulj
– hpaulj

2022-05-24 15:36:49 +00:00
Commented May 24, 2022 at 15:36
@CJR Thank you for your comment! I naively expected that a same shape array with floats of many decimals places would use more memory than an array (of floats) with no decimals at all. This is why I was looking for a way (memory wise) that could capture this reduction in digits. I also failed to mention that these arrays represent RGB images. I cannot convert 1.s and 0.s to 1s and 0s because plt.imshow() only works with [0. 1.] or [0 255] values. (an array of 1s and 0s would print a black image).

Amadeo Amadei
– Amadeo Amadei

2022-05-24 16:21:42 +00:00
Commented May 24, 2022 at 16:21

| Show 3 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Memory computation and sparse matrices for 3D arrays in Python

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked