Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx Convert XML to JSON

This is a continuation of a previous question: I need to convert XML to Json in JavaScript on Parse.com's Cloud Code

Please don't down vote this because you don't believe RegEx is the right choice for this. It's what I have to work with. If you have another idea of a way to do this, please let me know. But it must run on Parse.com's Cloud Code.

Original XML:

<?xml version="1.0" encoding="UTF-8" ?><api><products total-matched="1618" records-returned="1" page-number="1"><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product></products></api>

RegEx Code:

var regex = /(<\w+[^<]*?)\s+([\w-]+)="([^"]+)">/;
            while(xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>'); // For attributes

            xml = xml.replace(/\s/g, ' ').  // Finds all the white space converts to single space
                    replace(/< *\?[^>]*?\? *>/g, ''). //Finds the XML header and removes it
                    replace(/< *!--[^>]*?-- *>/g, ''). //Finds and removes all comments
                    replace(/< *(\/?) *(\w[\w-]+\b):(\w[\w-]+\b)/g, '<$1$2_$3').
                    replace(/< *(\w[\w-]+\b)([^>]*?)\/ *>/g, '< $1$2>').
                    replace(/(\w[\w-]+\b):(\w[\w-]+\b) *= *"([^>]*?)"/g, '$1_$2="$3"').
                    replace(/< *(\w[\w-]+\b)((?: *\w[\w-]+ *= *" *[^"]*?")+ *)>( *[^< ]*?\b.*?)< *\/ *\1 *>/g, '< $1$2 value="$3">').
                    //replace(/ *(\w[\w-]+\b) *= *"([^>]*?)" */g, '< $1>$2').
                    replace(/< *(\w[\w-]+\b) *</g, '<$1>< ').
                    replace(/> *>/g, '>').
                    //replace(/< *\/ *(\w[\w-]+\b) *> *< *\1 *>/g, '').  // breaks the output?
                    replace(/"/g, '\\"').
                    replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":"$2",').
                    replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":{$2},').
                    replace(/< *(\w[\w-]+\b) *>(?=.*?< \/\1\},\{)/g, '"$1":[{').
                    split(/\},\{/).
                    reverse().
                    join('},{').
                    replace(/< *\/ *(\w[\w-]+\b) *>(?=.*?"\1":\[\{)/g, '}],').
                    split(/\},\{/).
                    reverse().
                    join('},{').
                    replace(/< \/(\w[\w-]+\b)\},\{\1>/g, '},{').
                    replace(/< *(\w[\w-]+\b)[^>]*?>/g, '"$1":{').
                    replace(/< *\/ *\w[\w-]+ *>/g,'},').
                    replace(/\} *,(?= *(\}|\]))/g, '}').
                    replace(/] *,(?= *(\}|\]))/g, ']').
                    replace(/" *,(?= *(\}|\]))/g, '"').
                    replace(/ *, *$/g, '');

Output:

"api": {
    "page-number": "1",
    "records-returned": "1",
    "total-matched": "1618",
    "products": {
        "product": {
            "ad-id": "1234",
            "supplier-name": "Window World",
            "supplier-category": "3703703",
            "buy-url": "http://website.com",
            "currency": "USD",
            "description": "Window",
            "image-url": "http://website.com/windowa/80x80.jpg",
            "in-stock": "yes",
            "manufacturer-name": "Window World",
            "name": "Half Pain Glass",
            "price": "31.95",
            "retail-price": "87.60",
            "sale-price": "29.95",
            "sku": "5938",
            "upc": ""
        },
        "product": {
            "ad-id": "1234",
            "supplier-name": "Window World",
            "supplier-category": "3703703",
            "buy-url": "http://website.com",
            "currency": "USD",
            "description": "Window",
            "image-url": "http://website.com/windowa/80x80.jpg",
            "in-stock": "yes",
            "manufacturer-name": "Window World",
            "name": "Half Pain Glass",
            "price": "31.95",
            "retail-price": "87.60",
            "sale-price": "29.95",
            "sku": "5938",
            "upc": ""
        },
        "product": {
            "ad-id": "1234",
            "supplier-name": "Window World",
            "supplier-category": "3703703",
            "buy-url": "http://website.com",
            "currency": "USD",
            "description": "Window",
            "image-url": "http://website.com/windowa/80x80.jpg",
            "in-stock": "yes",
            "manufacturer-name": "Window World",
            "name": "Half Pain Glass",
            "price": "31.95",
            "retail-price": "87.60",
            "sale-price": "29.95",
            "sku": "5938",
            "upc": ""
        },
        "product": {
            "ad-id": "1234",
            "supplier-name": "Window World",
            "supplier-category": "3703703",
            "buy-url": "http://website.com",
            "currency": "USD",
            "description": "Window",
            "image-url": "http://website.com/windowa/80x80.jpg",
            "in-stock": "yes",
            "manufacturer-name": "Window World",
            "name": "Half Pain Glass",
            "price": "31.95",
            "retail-price": "87.60",
            "sale-price": "29.95",
            "sku": "5938",
            "upc": ""
        }
    }
}

The last issue I'm having with this (that I know of) is this doesn't make repeating items a JSON array. Any ideas on how to solve this?

like image 887
Brad Avatar asked Mar 24 '26 23:03

Brad


1 Answers

Ok so, note that it's a quick fix but nevertheless it seems to work. This will just ADD an array structure so your won't have several times the same key (but it won't destroy that key).
Change:

replace(/< *(\w[\w-]+\b) *>(?=.*?< \/\1\},\{)/g, '"$1":[{').
split(/\},\{/).
reverse().
join('},{').
replace(/< *\/ *(\w[\w-]+\b) *>(?=.*?"\1":\[\{)/g, '}],').
split(/\},\{/).
reverse().
join('},{').

which is an attempt to implement arrays.
And put :

replace(/< *(\w[\w-]+\b) *>(?=("\w[\w-]+\b)":\{.*?\},\2)(.*?)< *\/ *\1 *>/, '"$1":[$3],')

Note that I used pretty much his way of matching things. That seemed to work for you example at least.

like image 67
Loamhoof Avatar answered Mar 26 '26 12:03

Loamhoof



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!