I am trying to convert Word text pasted by users that contain MS Word ellipsis and long dash before processing it further.
I found an old proposed solution here to the problem http://www.codingforums.com/archive/index.php/t-47163.html , but it does not work for me. After replacing the ellipsis for example , the variable comes back as empty. Never seen anything like this before:
$src = "Long word dash – and weird Word ellipsis…";
$src = str_replace("‘", "'", $src);
$src = str_replace("’", "'", $src);
$src = str_replace("”", '"', $src);
$src = str_replace("“", '"', $src);
$src = str_replace("–", "-", $src);
$src = str_replace("…", "...", $src);
print $src;
Any ideas?
Special tools and editors are used to writing PHP code and MS Word is not among them. Hence, we conclude that the PHP Coding feature is not available in MS Word.
From the command line, run php --run 'highlight_file("/path/to/myfile. php");' > /tmp/code. html , and then open that file with your favourite Web browser. You can then cut and paste that colourized code into your document.
Alternatively, you can press Ctrl+H. Click in the “Find What” box and then delete any existing text or characters. Click the “More>>” button to open up the additional options, click the “Special” button, and then click the “Paragraph Mark” option from the dropdown list.
For anyone getting the diamond question mark in PHP, this method of replacing UTF-8 characters worked better than using the chr function.
$search = [ // www.fileformat.info/info/unicode/<NUM>/ <NUM> = 2018
"\xC2\xAB", // « (U+00AB) in UTF-8
"\xC2\xBB", // » (U+00BB) in UTF-8
"\xE2\x80\x98", // ‘ (U+2018) in UTF-8
"\xE2\x80\x99", // ’ (U+2019) in UTF-8
"\xE2\x80\x9A", // ‚ (U+201A) in UTF-8
"\xE2\x80\x9B", // ‛ (U+201B) in UTF-8
"\xE2\x80\x9C", // “ (U+201C) in UTF-8
"\xE2\x80\x9D", // ” (U+201D) in UTF-8
"\xE2\x80\x9E", // „ (U+201E) in UTF-8
"\xE2\x80\x9F", // ‟ (U+201F) in UTF-8
"\xE2\x80\xB9", // ‹ (U+2039) in UTF-8
"\xE2\x80\xBA", // › (U+203A) in UTF-8
"\xE2\x80\x93", // – (U+2013) in UTF-8
"\xE2\x80\x94", // — (U+2014) in UTF-8
"\xE2\x80\xA6" // … (U+2026) in UTF-8
];
$replacements = [
"<<",
">>",
"'",
"'",
"'",
"'",
'"',
'"',
'"',
'"',
"<",
">",
"-",
"-",
"..."
];
str_replace($search, $replacements, $string);
Hmm. I use this function for sanitizing text copied into an RTE. It may or may not work in this case. It converts to HTML entities, but you could tweak it to just convert to regular characters:
function convertFromCP1252($string)
{
$search = array('&',
'<',
'>',
'"',
chr(212),
chr(213),
chr(210),
chr(211),
chr(209),
chr(208),
chr(201),
chr(145),
chr(146),
chr(147),
chr(148),
chr(151),
chr(150),
chr(133),
chr(194)
);
$replace = array( '&',
'<',
'>',
'"',
'‘',
'’',
'“',
'”',
'–',
'—',
'…',
'‘',
'’',
'“',
'”',
'–',
'—',
'…',
''
);
return str_replace($search, $replace, $string);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With