1

Below is that data I'm trying to parse:

50‐59 1High300.00 Avg300.00
90‐99 11High222.00 Avg188.73
120‐1293High204.00 Avg169.33

The first section is a weight range, next is a count, followed by Highprice, ending with Avgprice.

As an example, I need to parse the data above into an array which would look like

[0]50-59
[1]1
[2]High300.00
[3]Avg300.00

[0]90-99
[1]11
[2]High222.00
[3]Avg188.73

[0]120‐129
[1]3
[2]High204.00
[3]Avg169.33

I thought about creating an array of what the possible weight ranges can be but I can't figure out how to use the values of the array to split the string.

$arr = array("10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109","110-119","120-129","130-139","140-149","150-159","160-169","170-179","180-189","190-199","200-209","210-219","220-229","230-239","240-249","250-259","260-269","270-279","280-289","290-299","300-309");

Any ideas would be greatly appreciated.

4
  • Is that format consistent, or can count exceed 10? It would be preferable to have a delimiter between each value. How is this string being generated? Commented Aug 24, 2016 at 4:30
  • @chris85 I agree that having a delimiter between the values is preferable but I don't have control over the data I am being given :( To answer your question, yes, the count can be over 10. The only thing I can safely assume is that count will be less than 999. Commented Aug 24, 2016 at 14:02
  • Where/how is the string being generated? Commented Aug 24, 2016 at 14:27
  • @chris85 The data is being sent from an external system that I can't control. Commented Aug 24, 2016 at 18:19

4 Answers 4

1

Hope this will work:

    $string='50-59 1High300.00 Avg300.00
    90-99 11High222.00 Avg188.73
    120-129 3High204.00 Avg169.33';

    $requiredData=array();
    $dataArray=explode("\n",$string);
    $counter=0;
    foreach($dataArray as $data)
    {
        if(preg_match('#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#', $data,$matches))    
        {
            $requiredData[$counter][]=$matches[1];
            $requiredData[$counter][]=$matches[2];
            $requiredData[$counter][]=$matches[3];
            $requiredData[$counter][]=$matches[4];
            $counter++;
        }
    }
    print_r($requiredData);
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for the suggestion and I will admit that my regex knowledge isn't great but I don't think that will work because of the space you have in the regex between the weight and count. The thing I'm struggling with is a row like this where there is no space. 120‐1293High204.00 Avg169.33 that needs to be parsed like [0]120‐129 [1]3 [2]High204.00 [3]Avg169.33
1
'#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#'

I don't think that will work because of the space you have in the regex between the weight and count. The thing I'm struggling with is a row like this where there is no space. 120‐1293High204.00 Avg169.33 that needs to be parsed like [0]120‐129 [1]3 [2]High204.00 [3]Avg169.33

You are right. That can be remedied by limiting the number of weight digits to three and making the space optional.

'#^(\d+-\d{1,3}) *…

Comments

0

$arr = array('50-59 1High300.00 Avg300.00', 
             '90-99 11High222.00 Avg188.73', 
             '120-129 3High204.00 Avg169.33');

foreach($arr as $str) {
    if (preg_match('/^(\d+-\d{1,3})\s*(\d+)(High\d+\.\d\d) (Avg\d+\.\d\d)/i', $str, $m)) {
        array_shift($m); //remove group 0 (ie. the whole match)
        $result[] = $m;
    }
}
print_r($result);

Output:

Array
(
    [0] => Array
        (
            [0] => 50-59
            [1] => 1
            [2] => High300.00
            [3] => Avg300.00
        )

    [1] => Array
        (
            [0] => 90-99
            [1] => 11
            [2] => High222.00
            [3] => Avg188.73
        )

    [2] => Array
        (
            [0] => 120-129
            [1] => 3
            [2] => High204.00
            [3] => Avg169.33
        )

)

Explanation:

/                   : regex delimiter
    ^               : begining of string
    (               : start group 1
      \d+-\d{1,3}   : 1 or more digits a dash and 1 upto 3 digits ie. weight range
    )               : end group 1
    \s*             : 0 or more space character
    (\d+)           : group 2 ie. count
    (High\d+\.\d\d) : group 3 literal High followed by price
    (Avg\d+\.\d\d)  : Group 4 literal Avg followed by price
/i                  : regex delimiter and case Insensitive modifier.

To be more generic, you could replace High and Avg by [a-z]+

Comments

0

This is a pattern you can trust (Pattern Demo):

/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m

The other answers overlooked the digital pattern in the weight range substring. The range start integer always ends in 0, and the range end integer always ends in 9; the range always spans ten integers.

My pattern will capture the digits that precede the 0 in the starting integer and reference them immediately after the dash, then require that captured number to be followed by a 9.

I want to point out that your sample input was a little bit tricky because your is not the standard - that is between the 0 and = on my keyboard. This was a sneaky little gotcha for me to solve.

Method (Demo):

$text = '50‐59 1High300.00 Avg300.00
90‐99 11High222.00Avg188.73
120‐1293High204.00 Avg169.33';

preg_match_all(
    '/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m',
    $text,
    $matches,
    PREG_SET_ORDER
);

var_export(
    array_map(
        fn($captured) => [
            'weight range' => $captured[1],
            'count' => $captured[3],
            'Highprice' => $captured[4],
            'Avgprice' => $captured[5]
        ],
        $matches
    )
);

Output:

array (
  0 => 
  array (
    'weight range' => '50‐59',
    'count' => '1',
    'Highprice' => '300.00',
    'Avgprice' => '300.00',
  ),
  1 => 
  array (
    'weight range' => '50‐59',
    'count' => '1',
    'Highprice' => '300.00',
    'Avgprice' => '300.00',
  ),
  2 => 
  array (
    'weight range' => '50‐59',
    'count' => '1',
    'Highprice' => '300.00',
    'Avgprice' => '300.00',
  ),
  3 => 
  array (
    'weight range' => '50‐59',
    'count' => '1',
    'Highprice' => '300.00',
    'Avgprice' => '300.00',
  ),
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.