Python RegExp retrieve value from matching string

Question

Deal all

I faced some not trivial problem for me to parse log.

I need to go through a file and check if the line matches the patter : if YES then get ClientID specified in this line.

The line looks like :

17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'

So I need to get 99071901.

I tried to construct regexp search pattern, but it is not complete..stuck at 'TRACE':

regex = '(^[(\d\.)]+) ([(\d\:)]+) ([\bTRACE\b]+) ([(\d)]+) ([\bGDS\b:)]+) ([\ClientID\b])'

Script code is :

log=open('t.log','r')
for i in log:
    key=re.search(regex,i)
    print(key.group()) #print string matching 
    for g in key:
        client_id=re.seach(????,g) # find ClientIt    
log.close()

Appreciate if you give me a hint how to solve this challenge.

Thank you.

If you need those digits, I think you just need r"'(\d+)'$" regex and grab .group(1). Else, to find other "submatches", "spell out" the pattern, like ^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$ and access the groups you need using appropriate indices. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Feb 13, 2017 at 11:02
Hi @WiktorStribiżew, if to apply your patter in loop:regex="^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"; for i in log: key=re.search(regex,i) ; if key is not None: ClientID=key.group(1); print(ClientID); else: print('not matching'). I got None, not matching — 27P
– 27P, Commented Feb 13, 2017 at 11:52
Why loop? No, just use m = re.search(pat, s), if m: print(m.group(n)) where n is the group ID. See the Python demo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Feb 13, 2017 at 11:53
Apologize, my bad! I missed some characters in my string. Now it works perfect — 27P
– 27P, Commented Feb 13, 2017 at 12:12

Inbar Rose · Accepted Answer · 2017-02-13 13:57:08Z

You don't need to be too specific. You can just capture the sections and parse them individually.

Lets start with just your one line for example:

line = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"

And then lets add our first regex that gets all the sections:

import re
line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
# now extract each section
date, time, level, thread, module, message = line_regex.match(line).groups()

Now, if we look at the different sections they will have all the information we need to make more decisions, or further parse them. Now lets get the client ID when the right kind of message shows up.

client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

if 'ClientID' in message:
    client_id = client_id_regex.match(message).group(1)

And now we have the client_id.

Just work that logic into your loop and you are all set.

line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

with open('t.log','r') as f:  # use with context manager to auto close the file
    for line in f:  # lets iterate over the lines
        sections = line_regex.match(line)  # make a match object for sections
        if not sections:
            continue  # probably you want to handle this case
        date, time, level, thread, module, message = sections.groups()
        if 'ClientID' in message:  # should we even look here for a client id?
            client_id = client_id_regex.match(message).group(1)
# now do what you wanted to do

HI Inbar Rose, Many thanks for your solution it works as well

Wiktor Stribiżew · Accepted Answer · 2017-02-13 12:13:42Z

You may use capturing parentheses around those parts in the pattern that you are interested in, and then access those parts using group(n) where n is the corresponding group ID:

import re
s = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"
regex = r"^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"
m = re.search(regex, s)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))
    print(m.group(4))
    print(m.group(5))

See the Python online demo

The pattern is

^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$

See its online demo here.

Note that you have messed the character classes with groups: (...) groups subpatterns and captures them while [...] defines character classes that match single characters.

Collectives™ on Stack Overflow

Python RegExp retrieve value from matching string

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related