This is a continuation of a previous question: I need to convert XML to Json in JavaScript on Parse.com's Cloud Code
Please don't down vote this because you don't believe RegEx is the right choice for this. It's what I have to work with. If you have another idea of a way to do this, please let me know. But it must run on Parse.com's Cloud Code.
Original XML:
<?xml version="1.0" encoding="UTF-8" ?><api><products total-matched="1618" records-returned="1" page-number="1"><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product></products></api>
RegEx Code:
var regex = /(<\w+[^<]*?)\s+([\w-]+)="([^"]+)">/;
while(xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>'); // For attributes
xml = xml.replace(/\s/g, ' '). // Finds all the white space converts to single space
replace(/< *\?[^>]*?\? *>/g, ''). //Finds the XML header and removes it
replace(/< *!--[^>]*?-- *>/g, ''). //Finds and removes all comments
replace(/< *(\/?) *(\w[\w-]+\b):(\w[\w-]+\b)/g, '<$1$2_$3').
replace(/< *(\w[\w-]+\b)([^>]*?)\/ *>/g, '< $1$2>').
replace(/(\w[\w-]+\b):(\w[\w-]+\b) *= *"([^>]*?)"/g, '$1_$2="$3"').
replace(/< *(\w[\w-]+\b)((?: *\w[\w-]+ *= *" *[^"]*?")+ *)>( *[^< ]*?\b.*?)< *\/ *\1 *>/g, '< $1$2 value="$3">').
//replace(/ *(\w[\w-]+\b) *= *"([^>]*?)" */g, '< $1>$2').
replace(/< *(\w[\w-]+\b) *</g, '<$1>< ').
replace(/> *>/g, '>').
//replace(/< *\/ *(\w[\w-]+\b) *> *< *\1 *>/g, ''). // breaks the output?
replace(/"/g, '\\"').
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":"$2",').
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":{$2},').
replace(/< *(\w[\w-]+\b) *>(?=.*?< \/\1\},\{)/g, '"$1":[{').
split(/\},\{/).
reverse().
join('},{').
replace(/< *\/ *(\w[\w-]+\b) *>(?=.*?"\1":\[\{)/g, '}],').
split(/\},\{/).
reverse().
join('},{').
replace(/< \/(\w[\w-]+\b)\},\{\1>/g, '},{').
replace(/< *(\w[\w-]+\b)[^>]*?>/g, '"$1":{').
replace(/< *\/ *\w[\w-]+ *>/g,'},').
replace(/\} *,(?= *(\}|\]))/g, '}').
replace(/] *,(?= *(\}|\]))/g, ']').
replace(/" *,(?= *(\}|\]))/g, '"').
replace(/ *, *$/g, '');
Output:
"api": {
"page-number": "1",
"records-returned": "1",
"total-matched": "1618",
"products": {
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
},
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
},
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
},
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
}
}
}
The last issue I'm having with this (that I know of) is this doesn't make repeating items a JSON array. Any ideas on how to solve this?
Ok so, note that it's a quick fix but nevertheless it seems to work. This will just ADD an array structure so your won't have several times the same key (but it won't destroy that key).
Change:
replace(/< *(\w[\w-]+\b) *>(?=.*?< \/\1\},\{)/g, '"$1":[{').
split(/\},\{/).
reverse().
join('},{').
replace(/< *\/ *(\w[\w-]+\b) *>(?=.*?"\1":\[\{)/g, '}],').
split(/\},\{/).
reverse().
join('},{').
which is an attempt to implement arrays.
And put :
replace(/< *(\w[\w-]+\b) *>(?=("\w[\w-]+\b)":\{.*?\},\2)(.*?)< *\/ *\1 *>/, '"$1":[$3],')
Note that I used pretty much his way of matching things. That seemed to work for you example at least.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With