Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace HTML entities (e.g. ’) with character equivalents when parsing an XML feed

When parsing an XML feed, I am getting text from the content tag, like this:

The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent accomodation. The latest grant will allow for major refurbishment to a section of the school to allow for new accommodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

EDIT:

Ti.API.info("is this real------------"+win.dataToPass)


returns: (line breaks added for clarity)

[INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
warning home owners and car owners in the town to be vigilant following a recent
spate of break-ins. There has been a number of thefts from gardens and vehicles
in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
said that residents have reported seeing a dark haired male in and around the
area in the early hours of the morning. Local Cllr Karina Carlin has been
monitoring the situation – she says the problem seems to be getting
worse…….


My external.js file is below i.e. the one which merely displays the text above:

var win= Titanium.UI.currentWindow;

Ti.API.info("Is this real------------------"+ win.dataToPass);

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

var newText= unescapeHTML(win.datatoPass);


var label= Titanium.UI.createLabel({
    color: "black",
    //text: win.dataToPass,//this works!
    text:newText,//this is causing an error
    font: "Helvetica",
    fontSize: 50,
    width: "auto",
    height: "auto",
    textAlign: "center"
})

win.add(label);
like image 792
user2363025 Avatar asked Mar 23 '23 03:03

user2363025


2 Answers

There are many libraries you can include in Titanium (Underscore.string, string.js that will make this happen, but if you only want the unescape html function, just try this code, adapted from the above libraries

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

This replaces those special characters with their human readable derivatives and returns the modified string. Just put this somewhere in code and your good to go, I have used this myself in Titanium and its quite handy.

like image 92
Josiah Hester Avatar answered Apr 05 '23 20:04

Josiah Hester


I have encountered same issue, and @Josiah Hester's solution does work for me. I have add a condition to check that only string values are handled.

    this.unescapeHTML = function(str) {
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    if(typeof(str) !== 'string'){
        return str;
    }else{
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;
        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }});
    }
};
like image 35
Dino Liu Avatar answered Apr 05 '23 20:04

Dino Liu