How to sum two arrays in Python?

Question

I made a DataFrame like this:

import numpy as np
import pandas as pd 

occurrence = np.array([4, 5, 4, 0, 1, 4, 3])
year = np.array([1851,1852,1853,1854,1855,1856,1857])
disaster = {"occur":pd.Series(occur),"year":pd.Series(year)}
df =  pd.DataFrame(disaster)

Now I want to make a function so that, when I give two years, it will give me the sum of occurrences of those two years. If I put 1851 and 1852, it will show me the occurrence is 9.

I wrote a function like this, but it shows error:

def dist(s1,s2):
    return (sum (year>=s1 and year< s2))

print dist(s1,s2)

Padraic Cunningham · Accepted Answer · 2015-02-03 18:18:14Z

4

print(df.loc[df['year'].isin((1851,1852))]["occur"].sum())

Or:

 print(df.loc[df.year.isin((1851,1852))].occur.sum())

For a range of dates creating a list of ranges seems more efficient than using &:

df.loc[df.year.isin(range(s1, s2+1))].occur.sum()

edited Feb 3, 2015 at 18:18

answered Feb 3, 2015 at 17:49

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Joe Kington · Accepted Answer · 2015-02-03 17:59:37Z

2

If you're specifically wanting only a numpy approach, you'd do something similar to this:

import numpy as np

occurrence= np.array([4, 5, 4, 0, 1, 4, 3])
year = np.array([1851,1852,1853,1854,1855,1856,1857])

year1, year2 = 1851, 1852
mask = (year == year1) | (year == year2)
print occurrence[mask].sum()

Note that if you wanted the sum of all occurences between those two years, you'd do something more like:

mask = (year >= year1) & (year <= year2)

With pandas, the same approach still works, but as others have noted, there are more efficient ways of building the boolean mask with the isin method, if you're interested in just those two years (and not the interval between them).

edited Feb 3, 2015 at 17:59

answered Feb 3, 2015 at 17:46

Joe Kington

287k73 gold badges621 silver badges474 bronze badges

3 Comments

Benzamin Over a year ago

i need all the values in between any two years, Thanks

Joe Kington Over a year ago

@AlfradNobel - Then use the mask shown in the second code snippet. It's the same for pandas as it is for numpy.

Benzamin Over a year ago

if i want to write it as a function, then how can i do that ? can you please tell ?

Alex Riley · Accepted Answer · 2015-02-03 18:56:18Z

2

You need to use & instead of and. This means your function should be:

def dist(s1, s2):
    return df.occur[(df.year >= s1) & (df.year <= s2)].sum()

And then you have:

In [72]: dist(1851, 1852)
Out[72]: 9

Both 1851 <= df.year and df.year <= 1852 create a boolean Series. The Python and does not work with these objects as we want - it essentially calls bool on each Series and this causes the error. On the other hand, & will perform a element-wise and, returning True when both Series are True.

You might also find isin() useful for summing the values for a given list of dates. For example:

>>> df.occur[df.year.isin([1851, 1852])].sum()
9

edited Feb 3, 2015 at 18:56

answered Feb 3, 2015 at 17:48

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

Comments

Vor · Accepted Answer · 2015-02-03 17:46:30Z

0

In [21]: import numpy as np

In [22]: import pandas as pd

In [23]: occurrence= np.array([4, 5, 4, 0, 1, 4, 3])

In [24]: year = np.array([1851,1852,1853,1854,1855,1856,1857])

In [25]: my_func = lambda *l: sum([x[0] for x in zip(occurrence, year) if x[1] in l])

In [26]: my_func(1851, 1852)
Out[26]: 9

In [27]:

answered Feb 3, 2015 at 17:46

Vor

35.6k47 gold badges142 silver badges196 bronze badges

3 Comments

Joe Kington Over a year ago

Note that this is rather inefficient for numpy arrays. (Iteration through a numpy array is relatively slow at the python level compared to lists.) For a small array, it won't matter much, but it will be needlessly slow and memory-inefficient for larger arrays.

Benzamin Over a year ago

Vor; your code only gives the sum of first and second year not the cumulative. like when i gave my_func(1851, 1852) it gave 4 but it should be 13, thanks a lot

Benzamin Over a year ago

Hi all, thanks a lot, but, i gave two years as an example. But i need sum of the values between any two years. s when i am running this code it gives me sum of two years not the the sum of all the values in between any two years

Collectives™ on Stack Overflow

How to sum two arrays in Python?

4 Answers 4

Comments

3 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related