1

I have some output of parser and i want to search substing in it. the output:

(ROOT  
(S
(NP (DT the) (NN author))
(VP (VBZ has)
  (VP (VBN failed)
    (S
      (VP (TO to)
        (VP
          (VP (VB catch)
            (PRT (RP up))
            (PP (IN with)
              (NP
                (NP (JJ recent) (NNS discoveries))
                (PP (IN about)
                  (NP (NNP Thespis))))))
          (PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))
          (NP (`` `) (JJ lost) ('' ') (JJ joint) (NN work)))))))
(. .)))

i want to match this part:

(PRN (, ,)
        (S
          (NP (NNP Gilbert)
            (CC and)
            (NNP Sullivan))
          (VP (VBZ 's)
            (ADJP (JJ first))))
        (, ,))

in this part PRN be static. How would I write regex for that?

3
  • PRN is static text? It will not be changed? Commented Jun 14, 2014 at 10:33
  • @nevermind, yes. it is static Commented Jun 14, 2014 at 10:34
  • What syntax is that from? Looks exotic. Commented Jun 14, 2014 at 14:48

2 Answers 2

2

Assuming that you wish to match a parenthesized group which begins with PRN and which may contain nested parenthesized groups (and this whole chunk may itself be nested within enclosing parenthesized groups), then the following tested recursive regex solution will do the trick:

A recursive PCRE regex solution:

<?php // test.php 20140614_0800
// The regex:
$re = '/
    # Match nested parenthesized group beginning with PRN.
    \(PRN          # Literal opening sequence.
    (              # $1: Recursive subroutine!
      (?:          # Zero or more contents alternatives.
        [^()]++    # Either one or more non-parentheses,
      | \((?1)\)   # Or a nested parenthesized group.
      )*           # End zero or more contents alternatives.
    )              # End $1: Recursive subroutine!
    \)             # Literal closing sequence.
    /x';

// The string:
$s = '(ROOT  
(S
(NP (DT the) (NN author))
(VP (VBZ has)
  (VP (VBN failed)
    (S
      (VP (TO to)
        (VP
          (VP (VB catch)
            (PRT (RP up))
            (PP (IN with)
              (NP
                (NP (JJ recent) (NNS discoveries))
                (PP (IN about)
                  (NP (NNP Thespis))))))
          (PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ \'s)
                (ADJP (JJ first))))
            (, ,))
          (NP (`` `) (JJ lost) (\'\' \') (JJ joint) (NN work)))))))
(. .)))';

// The code:
if (preg_match($re, $s, $matches)) {
    printf("Match found:\n%s", $matches[0]);
} else {
    echo('No match');
}
?>

Here is the output:

Match found:
(PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))

Note that this solution requires that all the groups have properly balanced and matching open and close parentheses.

Sign up to request clarification or add additional context in comments.

1 Comment

Recursion needs more love, +1 :)
1

Are you just asking for an expression that will match the above output? If so, this works:

$output = "(ROOT
(S
(NP (DT the) (NN author))
(VP (VBZ has)
  (VP (VBN failed)
    (S
      (VP (TO to)
        (VP
          (VP (VB catch)
            (PRT (RP up))
            (PP (IN with)
              (NP
                (NP (JJ recent) (NNS discoveries))
                (PP (IN about)
                  (NP (NNP Thespis))))))
          (PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))
          (NP (`` `) (JJ lost) ('' ') (JJ joint) (NN work)))))))
(. .)))";

preg_match('/\(PRN.*,\)\)/s', $output, $match);
print_r($match[0]);

Output:

php reg.php
(PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.