1

My function was suposed to receive a large string, go through it, and find the maximum number of times the pattern "AGATC" repeats consecutively. Regardless of what I feed this function, my return is always 1.

def agatc(s):
    maxrep = 0
    temp = 0
    for i in range(len(s) - 4):
        if s[i] == "A" and s[i + 1] == "G" and s[i + 2] == "A" and s[i + 3] == "T" and s[i + 4] == "C":
            temp += 1
            print(i)
            i += 3
        else:
            if temp > maxrep:
                maxrep = temp
            temp = 0
    return maxrep

Also tried initializing the for loop with (0, len(s) - 4, 1), got the same return.

I though the problem might be in adding 3 to the i variable (apparently it wasn't), so I added print(i) to see what was happening. I got the following:

45
1938
2049
2195
2952
2957
2962
2967
2972
2977
2982
2987
2992
2997
3002
3007
3012
3017
3022
3689
4754
11
  • 1
    can you give sample input for s Commented Jun 14, 2020 at 12:54
  • 2
    Side note: you can use if s[i:i+5] == "AGATC". Commented Jun 14, 2020 at 12:57
  • 1
    Which implies that you probably wanna change range(len(s) - 4) to range(len(s) - 5). Commented Jun 14, 2020 at 12:58
  • 2
    i += 3 won't do anything, it's immediately replaced by the next value from the range by the loop itself. Commented Jun 14, 2020 at 12:58
  • 1
    It's because you don't let maxrep be any bigger than 1... Once you found a match, you do maxrep = temp but then initialize temp = 0. Now it will never hold that temp > maxrep so you will never change maxrep which is now 1... Why do you even need temp and the else clause? Why not just maxrep += 1 once the condition is true? Commented Jun 14, 2020 at 13:05

4 Answers 4

3

In this way you can find the number of overlapping matches:

def agatc(s):
    temp = 0
    for i in range(len(s) - len("AGATC") + 1):
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
    return temp

If you want to find non-overlapping matches:

def agatc(s):
    temp = 0
    i = 0
    while i < len(s) - len("AGATC") + 1:
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
            i += len("AGATC")
        else:
            i += 1
    return temp
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you! So I can't use for loops as I did in C, right? It's not possible to change the value of i in the middle of the iteration?
In C you perform an operation at the end of the for cycle body (for example, I++). In python, first all the values are generated (with range(n)), that is before the for cycle begins, and then i takes all those values. So even if you change i within the body, it has no effect on range values.
len(s) - len("AGATC") is wrong. Try s = "AGATCAGATCAAAAAGATCA"
The difference is 15. Therefore the last valid index is 14, because i is strictly less than 15
exactly, so you skip the last letter. You would expect the last index to be 15
|
1

A simple solution with module re

import re

s = 'FGHAGATCATCFJSFAGATCAGATCFHGH'
match = re.finditer('(?P<name>AGATC)+', s)
max_len = 0
result = tuple()
for m in match:
    l = m.end() - m.start()
    if l > max_len:
        max_len = l
        result = (m.start(), m.end())

print(result)

Comments

0

Personally I would use regular expressions. But if you do not want that, you could use the str.find() method. Here is my solution:

def agatc(s):
    cnt = 0
    findstr='aga'                             # pattern you are looking for
    for i in range(len(s)):
        index = s.find(findstr)
        if index != -1:
            cnt+=1
            s = s[index+1:]                   # overlapping matches
            # s = s[index+len(findstr):]      # non-overlapping matches only
            print(index, s)                   # just to see what happens
    return cnt

Comments

0

This function counts the greatest amount of consecutive 'AGATC's in a string and returns the amount:

def agatc(s):
    w = "AGATC"
    maxrep = [m.start() for m in re.finditer(w,s)] # The beginning index fror each AGATC
    c = ''
    for i,v in enumerate(maxrep):
        if i < len(maxrep)-1:
            if v+5 == maxrep[i+1]:
                c+='y'
            else:
                c+='n'

    return len(max(c.split('n')))+1

print(agatc("oooooooooAGATCooooAGATCAGATCAGATCAGATCooooooAGATCAGATC"))

Output:

4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.