Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regexp for parsing xml to array

Tags:

regex

php

pls. I have problem using regexp expresion in the following php function:

$xml1 = "<arg1>
        <S113-03>1</S113-03>
        <S184-06>1</S184-06>
    </arg1>";

$xml2 = "<arg1>
        <P055>1</P055>
        <P096>1</P096>
    </arg1>";

function xml2array($xml) {
     $xmlArray = array();
     $regexp = "/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s";
     preg_match_all($regexp, $xml, $elements);

     foreach ($elements[1] as $ie => $element) {
         if (preg_match($regexp, $elements[3][$ie]))
             $xmlArray[$element] = xml2array($elements[3][$ie]);
         else {
             $xmlArray[$element] = trim($elements[3][$ie]);
         }
     }
return $xmlArray;
}

$array = xml2array($xml1);
echo print_r($array, true);

while $xml2 gives me result (it is OK):

Array
(
    [arg1] => Array
        (
            [P055] => 1
            [P096] => 1
        )

)

while $xml1 gives me result (wrong):

Array
(
    [arg1] => <S113-03>1</S113-03>
            <S184-06>1</S184-06>
)

I believe the problem is in regexp, but its content seems to be chinesse tea for me

like image 217
sorrex Avatar asked Apr 22 '15 10:04

sorrex


3 Answers

It would be easier and quicker (more memory-wise) to use PHP SimpleXML functionality.

$xml1 = "<arg1>
        <S113-03>1</S113-03>
        <S184-06>2</S184-06>
    </arg1>";

$xml2 = "<arg1>
        <P055>3</P055>
        <P096>4</P096>
    </arg1>";

var_dump(new \SimpleXMLElement($xml1));
var_dump(new \SimpleXMLElement($xml2));

dumps:

php test.php
class SimpleXMLElement#1 (2) {
  public $S113-03 =>
  string(1) "1"
  public $S184-06 =>
  string(1) "2"
}
class SimpleXMLElement#1 (2) {
  public $P055 =>
  string(1) "3"
  public $P096 =>
  string(1) "4"
}
like image 22
yergo Avatar answered Nov 09 '22 23:11

yergo


You know Chuck Norris?

Chuck Norris can parse HTML with RegExp.

Anyway here with go without RegExp:

Demo

<?php

$xml1 = "<arg1>
        <S113-03>1</S113-03>
        <S184-06>1</S184-06>
    </arg1>";

$xml2 = "<arg1>
        <P055>1</P055>
        <P096>1</P096>
    </arg1>";

function xml2array($xmlString)
{
    $xml   = simplexml_load_string($xmlString, 'SimpleXMLElement', LIBXML_NOCDATA);
    return json_decode(json_encode((array)$xml), TRUE);
}

var_dump(xml2array($xml1));
var_dump(xml2array($xml2));

Output:

array(2) {
  ["S113-03"]=>
  string(1) "1"
  ["S184-06"]=>
  string(1) "1"
}
array(2) {
  ["P055"]=>
  string(1) "1"
  ["P096"]=>
  string(1) "1"
}
like image 124
Jens A. Koch Avatar answered Nov 09 '22 22:11

Jens A. Koch


Use this fix, note the updated (\w+) that is now ([\w-]+):

$regexp = "/<([\w-]+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s";

The result is

Array                                                                                                                                                                                                                                                  
(                                                                                                                                                                                                                                                      
    [arg1] => Array                                                                                                                                                                                                                                    
        (                                                                                                                                                                                                                                              
            [S113-03] => 1                                                                                                                                                                                                                             
            [S184-06] => 1                                                                                                                                                                                                                             
        )                                                                                                                                                                                                                                              

) 

Here is the sample code.

like image 29
Wiktor Stribiżew Avatar answered Nov 09 '22 23:11

Wiktor Stribiżew