2

I made a DataFrame like this:

import numpy as np
import pandas as pd 

occurrence = np.array([4, 5, 4, 0, 1, 4, 3])
year = np.array([1851,1852,1853,1854,1855,1856,1857])
disaster = {"occur":pd.Series(occur),"year":pd.Series(year)}
df =  pd.DataFrame(disaster)

Now I want to make a function so that, when I give two years, it will give me the sum of occurrences of those two years. If I put 1851 and 1852, it will show me the occurrence is 9.

I wrote a function like this, but it shows error:

def dist(s1,s2):
    return (sum (year>=s1 and year< s2))

print dist(s1,s2)

4 Answers 4

4
print(df.loc[df['year'].isin((1851,1852))]["occur"].sum())

Or:

 print(df.loc[df.year.isin((1851,1852))].occur.sum())

For a range of dates creating a list of ranges seems more efficient than using &:

df.loc[df.year.isin(range(s1, s2+1))].occur.sum()
Sign up to request clarification or add additional context in comments.

Comments

2

If you're specifically wanting only a numpy approach, you'd do something similar to this:

import numpy as np

occurrence= np.array([4, 5, 4, 0, 1, 4, 3])
year = np.array([1851,1852,1853,1854,1855,1856,1857])

year1, year2 = 1851, 1852
mask = (year == year1) | (year == year2)
print occurrence[mask].sum()

Note that if you wanted the sum of all occurences between those two years, you'd do something more like:

mask = (year >= year1) & (year <= year2)

With pandas, the same approach still works, but as others have noted, there are more efficient ways of building the boolean mask with the isin method, if you're interested in just those two years (and not the interval between them).

3 Comments

i need all the values in between any two years, Thanks
@AlfradNobel - Then use the mask shown in the second code snippet. It's the same for pandas as it is for numpy.
if i want to write it as a function, then how can i do that ? can you please tell ?
2

You need to use & instead of and. This means your function should be:

def dist(s1, s2):
    return df.occur[(df.year >= s1) & (df.year <= s2)].sum()

And then you have:

In [72]: dist(1851, 1852)
Out[72]: 9

Both 1851 <= df.year and df.year <= 1852 create a boolean Series. The Python and does not work with these objects as we want - it essentially calls bool on each Series and this causes the error. On the other hand, & will perform a element-wise and, returning True when both Series are True.

You might also find isin() useful for summing the values for a given list of dates. For example:

>>> df.occur[df.year.isin([1851, 1852])].sum()
9

Comments

0
In [21]: import numpy as np

In [22]: import pandas as pd

In [23]: occurrence= np.array([4, 5, 4, 0, 1, 4, 3])

In [24]: year = np.array([1851,1852,1853,1854,1855,1856,1857])

In [25]: my_func = lambda *l: sum([x[0] for x in zip(occurrence, year) if x[1] in l])

In [26]: my_func(1851, 1852)
Out[26]: 9

In [27]: 

3 Comments

Note that this is rather inefficient for numpy arrays. (Iteration through a numpy array is relatively slow at the python level compared to lists.) For a small array, it won't matter much, but it will be needlessly slow and memory-inefficient for larger arrays.
Vor; your code only gives the sum of first and second year not the cumulative. like when i gave my_func(1851, 1852) it gave 4 but it should be 13, thanks a lot
Hi all, thanks a lot, but, i gave two years as an example. But i need sum of the values between any two years. s when i am running this code it gives me sum of two years not the the sum of all the values in between any two years

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.