3

I want to get values of command tags (GET, FROM, IN, etc.) My command is:

// My command
$_cmd = 'GET a, b FROM p IN a and c="I am from Sarajevo" or d>1 ';

// My parser
if(preg_match_all('/(GET|FROM|IN)\s+([^\s]+)/si',$_cmd, $m))
    $cmd = array_combine($m[1], $m[2]);

Output:

Array
(
  [GET] => a,
  [FROM] => p
  [IN] => a
  [from] => Sarajevo"
)

I am looking for this output:

Array
(
  [GET] => a, b
  [FROM] => p
  [IN] => a and c="I am from Sarajevo" or d>1
)

As you see, problem is with whitespaces and repeated command tags in strings (like from). So how can I parse this command?

7
  • 2
    What if you skip the /i-modifier - wouldn't you be able to make sure that you're only counting FROM and not from? Would there be a scenario where the IN would involve a string with a uppercased from? Commented Aug 12, 2011 at 13:26
  • I guess that relates to SQL... simply use your SQL, don't try to develop an interface for that. Commented Aug 12, 2011 at 13:29
  • @Dor - Please don't advice SQL parser. I need to parse this for my ORM project. Commented Aug 12, 2011 at 13:30
  • 1
    @dino beytar: "Object-relational mapping (ORM) systems (and the “frameworks” that use them) are another frequent performance nightmare.", From the book High Performance MySQL, Second Edition Commented Aug 12, 2011 at 13:59
  • @Dor - Yes, you right. But I think, this command line helps me to get standardisation my project. So i suppose, i will get less trouble with upgrades. I just want to try this way. Maybe I am completely on wrong way. Commented Aug 12, 2011 at 14:07

5 Answers 5

8

You cannot easily parse that with a single regex. (It's doable, but not simple.)

You should use a simple tokenizer, where a regex again becomes a useful tool:

  preg_match_all('/\w+|".*?"|\W/', $_cmd = 'GET a, b FROM p IN a and c="I am from Sarajevo" or d>1 ', $list);

This gives you a simple list, where you just have to find the clauses that you are interested in, then remerge the subsequent tokens (though I'm confused about your use case):

[0] => Array
    (
        [0] => GET
        [1] => a
        [2] => ,
        [3] => b
        [4] => FROM
        [5] => p
        [6] => IN
        [7] => a
        [8] => and
        [9] => c
        [10] => =
        [11] => "I am from Sarajevo"
        [12] => or
        [13] => d
        [14] => >
        [15] => 1
    )
Sign up to request clarification or add additional context in comments.

4 Comments

I got empty spaces after each item. Can you check your code one more, please? Am i wrong?
You can use (?!\s)\W in place of \W to remove spaces. If you want to merge the resulting parts you should keep the spaces however. (That's why I disabled that option again.) -- Really depends on if you need it as tokenizer, or just to break up the string parts.
Could I do this without using regex?
Of course; but that's just more work. (String functions and lots of PHP code is typically slower than preg_match.)
3
if( preg_match_all('/(GET|FROM|IN)(.(?!(GET|FROM|IN)))+\s*/si',$_cmd, $m))

this means - find any char after keyword which is not followed by GET, FROM or IN whith whitespace after it

2 Comments

But GET is followed by FROM?! I see the idea, but does it work?
GET ... FROM - there is intersection
1

You need to develop a scripting language for this. Regexps aren't suitable for these purposes.

4 Comments

So what's your solution without using Regex?
@dino beytar: You could be using Regex but that's absolutely not the main part of the solution. You'll need to develop (or use an available FOSS) an interpreter: en.wikipedia.org/wiki/Interpreter_%28computing%29
You meant that, should i develop or use an interpreter without using PHP, or I've been already trying to develop an interpreter?
@dino beytar: I think that you already tried to develop one using Regex and I told you that this is not the way to develop an interpreter. But I also think that you should avoid trying to develop or use an interpreter because that will be a performance nightmare.
1

You could remove the case insensitive i after the delimiter /. And also make sure there is at least one whitespace after the keywords.

1 Comment

What if 'GET a, b FROM p IN a and c="I am FROM Sarajevo" or d>1 ' ?
1
$_cmd = 'GET a, b FROM p IN a and c="I am from Sarajevo" or d>1 ';
$tpar = preg_split('/\s+(GET|FROM|IN)\s+/i', ' '.$_cmd.' ', -1, PREG_SPLIT_DELIM_CAPTURE);
array_walk($tpar, 'trim');

print_r($tpar);

// gives:
array(
  [0] => GET
  [1] => a, b
  [2] => FROM
  [3] => p
  [4] => IN
  [5] => a and c="I am from Sarajevo" or d>1
)
// the rest is straight forward

2 Comments

Output are not the same with you mention above. I think somethings wrong with your code? Am i wrong?
yes, I missed the "form" inside the quotes, I am anwsering without PHP box.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.