0

Lets say I have an arrays of strings.I want to find all the string which contain the following substring , charachter digit digit digit charachter (CDDDC will be the pattern). For instance the format would be as following: H554L K007K

Is there any fast string expression matching to find such occurrences ?

3
  • 4
    Did you look into the regex module (docs.python.org/3/library/re.html)? Commented Mar 3, 2020 at 13:30
  • Seems like this might help me thanks ! Commented Mar 3, 2020 at 13:32
  • 1
    Using the module re, you can have something like pattern = re.compile('[A-Z]\d{3}[A-Z]') and then test with something like if pattern.match(your_string):. Commented Mar 3, 2020 at 13:36

2 Answers 2

3

Things like this are the field of "regex". Regex is made for pattern matching. It of itself is a broad ttopic too much to explain here (check regexbuddy or another site).

python has a regex compiler build in, under the re (as well as regex module). A simple solution would hence be:

word for word in somelist if re.search(r"[a-zA-Z]\d{3}[a-zA-Z]", word)

Which iterates over somelist, and selects anything that matches (completely) a character in one of the two "ranges", followed by 3 digits, followed by a character in the range.

A noted as in the comments: re.search will match (find) any item which has a "part" of that item matching the "pattern". So it will match a123b as well as abc b123cd. If you wish to make sure that the full "word" in the array matches the substring use re.fullmatch instead.
Fullmatch will match a123b but not abc b123cd and not ab123cd

Sign up to request clarification or add additional context in comments.

3 Comments

I'd rather suggest re.search instead of re.match - the OP mentioned "substring", which is not necessarily at the beginning of a string.
I was actually thinking of using fullmatch as I interpreted "substring" as "one sub of the array" - but your idea might be better. (and accidentally chose match as in javascript match does the same as fullmatch in python)
You don't need that is not None at the end of the comprehension.
1

Try this example with this regex:

regex: (?i)[A-Z]\d\d\d[A-Z]

import re
xx = ['aeeea','5eeae','H554L','juan','K007K']
for i in xx:
  r1 = re.findall(r"(?i)[A-Z]\d\d\d[A-Z]", i)
  print (', '.join(r1)
)

Run the example online

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.