Python - combine regex patterns

Question

I have a large text and the aim is to select all 10-character strings for which the first character is a letter and the last character is a digit.

I am a python rookie and what I managed to achieve is to find all 10-character strings:

ten_char = re.findall(r"\D(\w{10})\D", pdfdoc)

Question is how can I put together my other conditions: apart from a 10-character string, I am looking for one where the first character is a letter and the last character is a digit.

Suggestions appreciated!

You can use [A-Za-z] and [0-9] to tell it the character at this position should be an alphabetical character or a digit. — JClarke
– JClarke, Commented Sep 9, 2016 at 21:55

online Thomas · Accepted Answer · 2016-09-09 22:06:22Z

2

([a-z].{8}[0-9])

Will ask for 1 alphabetical char, 8 other character and finally 1 number.

JS Demo

var re = /([a-z].{8}[0-9])/gi; 
var str = 'Aasdf23423423423423423b423423423423423';
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
     console.log(m[0]);
}

https://regex101.com/r/gI8jZ4/1

edited Sep 9, 2016 at 22:06

answered Sep 9, 2016 at 21:53

online Thomas

9,4887 gold badges52 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dbosky Over a year ago

Don't use [.] in the middle. It can match whitespace. Use \w.

Christian Ternus Over a year ago

You might want to use \w for all non-whitespace characters, or [a-zA-Z] to include capitalized alphanumerics -- and don't forget about non-ASCII.

online Thomas Over a year ago

@DawidGrabowski and Christian, I can edit it but that would only include any letter, number or underscore. and I don't see that reflecting the question at this moment. Could you please ellaborate?

dawg · Accepted Answer · 2016-09-09 22:14:35Z

1

If I understand it, do:

r'\b([a-zA-Z]\S{8}\d)\b'

Demo

Python demo:

>>> import re
>>> txt="""\
... Should match:
... a123456789 aA34567s89 zzzzzzzer9
... 
... Not match:
... 1123456789 aA34567s8a zzzzzzer9 zzzxzzzze99"""
>>> re.findall(r'\b([a-zA-Z]\S{8}\d)\b', txt)
['a123456789', 'aA34567s89', 'zzzzzzzer9']

answered Sep 9, 2016 at 22:14

dawg

105k24 gold badges143 silver badges217 bronze badges

Comments

Ben · Accepted Answer · 2016-09-09 21:58:46Z

0

I wouldn't use regex for this. Regular string manipulation is more clear in my opinion (though I haven't tested the following code).

def get_useful_words(filename):
    with open(filename, 'r') as file:
        for line in file:
            for word in line.split():
                if len(word) == 10 and word[0].isalpha() and word[-1].isdigit():
                    yield word


for useful_word in get_useful_words('tmp.txt'):
    print(useful_word)

answered Sep 9, 2016 at 21:58

Ben

6,4834 gold badges38 silver badges46 bronze badges

3 Comments

Ben Over a year ago

@DawidGrabowski Could you please explain the inefficiencies? I'm not caching a regular expression, but I'm also not reading the whole file into memory at one time. The question specified a large text file.

dbosky Over a year ago

Memory wise I agree but that's just gonna be slower comparing to regex.

Jared Goguen Over a year ago

Personally, I don't find this more clear. This seems like the perfect use for a regular expression.

Palomar · Accepted Answer · 2016-09-09 22:22:32Z

0

thank you very much for a great discussion and interesting suggestions. Very first post on stack overflow, but wow...what a community you are!

In fact, using:

r'\b([a-zA-Z]\S{8}\d)'

solved my problem very nicely. Really appreciated all your comments.

answered Sep 9, 2016 at 22:22

Palomar

332 bronze badges

1 Comment

dawg Over a year ago

Be sure to use r'\b([a-zA-Z]\S{8}\d)\b' or you will also match words longer than 10 characters that have a matching prefix...

Collectives™ on Stack Overflow

Python - combine regex patterns

4 Answers 4

3 Comments

Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related