2

Deal all

I faced some not trivial problem for me to parse log.

I need to go through a file and check if the line matches the patter : if YES then get ClientID specified in this line.

The line looks like :

17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'

So I need to get 99071901.

I tried to construct regexp search pattern, but it is not complete..stuck at 'TRACE':

regex = '(^[(\d\.)]+) ([(\d\:)]+) ([\bTRACE\b]+) ([(\d)]+) ([\bGDS\b:)]+) ([\ClientID\b])'

Script code is :

log=open('t.log','r')
for i in log:
    key=re.search(regex,i)
    print(key.group()) #print string matching 
    for g in key:
        client_id=re.seach(????,g) # find ClientIt    
log.close()

Appreciate if you give me a hint how to solve this challenge.

Thank you.

5
  • 1
    If you need those digits, I think you just need r"'(\d+)'$" regex and grab .group(1). Else, to find other "submatches", "spell out" the pattern, like ^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$ and access the groups you need using appropriate indices. Commented Feb 13, 2017 at 11:02
  • Hi @WiktorStribiżew, if to apply your patter in loop:regex="^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"; for i in log: key=re.search(regex,i) ; if key is not None: ClientID=key.group(1); print(ClientID); else: print('not matching'). I got None, not matching Commented Feb 13, 2017 at 11:52
  • Why loop? No, just use m = re.search(pat, s), if m: print(m.group(n)) where n is the group ID. See the Python demo. Commented Feb 13, 2017 at 11:53
  • See ideone.com/B74GUk Commented Feb 13, 2017 at 11:55
  • Apologize, my bad! I missed some characters in my string. Now it works perfect Commented Feb 13, 2017 at 12:12

2 Answers 2

2

You don't need to be too specific. You can just capture the sections and parse them individually.

Lets start with just your one line for example:

line = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"

And then lets add our first regex that gets all the sections:

import re
line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
# now extract each section
date, time, level, thread, module, message = line_regex.match(line).groups()

Now, if we look at the different sections they will have all the information we need to make more decisions, or further parse them. Now lets get the client ID when the right kind of message shows up.

client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

if 'ClientID' in message:
    client_id = client_id_regex.match(message).group(1)

And now we have the client_id.


Just work that logic into your loop and you are all set.

line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

with open('t.log','r') as f:  # use with context manager to auto close the file
    for line in f:  # lets iterate over the lines
        sections = line_regex.match(line)  # make a match object for sections
        if not sections:
            continue  # probably you want to handle this case
        date, time, level, thread, module, message = sections.groups()
        if 'ClientID' in message:  # should we even look here for a client id?
            client_id = client_id_regex.match(message).group(1)
# now do what you wanted to do
Sign up to request clarification or add additional context in comments.

1 Comment

HI Inbar Rose, Many thanks for your solution it works as well
1

You may use capturing parentheses around those parts in the pattern that you are interested in, and then access those parts using group(n) where n is the corresponding group ID:

import re
s = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"
regex = r"^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"
m = re.search(regex, s)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))
    print(m.group(4))
    print(m.group(5))

See the Python online demo

The pattern is

^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$

See its online demo here.

Note that you have messed the character classes with groups: (...) groups subpatterns and captures them while [...] defines character classes that match single characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.