Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conversion of MathematicalPI symbol names to Unicode

Tags:

pdf

unicode

I am processing PDF files and wish to convert characters to Unicode as far as possible. The MathematicalPI family of character sets appear to use their own symbol names (e.g. "H11001"). By exploration I have constructed a table (for MathematicalPI-One) like:

    <chars>
        <char charname="H11001" codepoint16="0X2B" codepoint="43" unicodeName="PLUS"/>
        <char charname="H11002" codepoint16="0x2D" codepoint="45" unicodeName="MINUS"/>
        <char charname="H11003" codepoint16="0XD7" codepoint="215" unicodeName="MULTIPLICATION SIGN"/> 
         <char charname="H11005" codepoint16="0X3D" codepoint="61" unicodeName="EQUALS"/>
    </char> 

Can anyone point me to an existing translation table like this (ideally for all MathematicalPI sets). [I don't want a graphical display of glyphs as that means each has to be looked up as a Unicode equivalent.]

Also there seems to be a similar symbol resource where the charnames are of the form C223 (for copyright). Any information on this will be appreciated.

UPDATE: I need something well beyond @user1808924's answer - I have already compiled by own (partial) translation table so it's certainly possible to construct one. It is possible to download and display a list of glyphs in MathematicalPI (may hundreds) and to go through the Unicode spec making equivalences (and for the majority I think there are clear equivalences). A satisfactory answer would either include a table with hundreds of equivalences or a defintive statement that this would violate Copyright of the font creator.

UPDATE: Between @minopret and @Miguel it is certainly possible to construct a mapping. The MathPi sets are well defined - a few hundred - and shapecatcher makes it easy to find the best glyphs pictorially. The mapping won't be definitive (i.e. with Adobe's stamp) but it will be worthwhile. And I suspect there will be cases where two different glyphs are essentially identical and so a visual mapping wont work - e.g. is an equilateral triangle INCREMENT or GREEK CAPITAL LETTER DELTA?

I doubt that I personally will complete a full table - I don't know what some of the symbols mean. But I hope to produce a subset used in Scientific technical medical (STM) publishing.

@user1808924 I notice you answered this on your first day on SO. Bounty questions are normally offered (as in this case) for difficult questions where there is a definitive answer but it is difficult to find. It's not normally useful to offer opinions or guesses unless you have expert knowledge of the area.

like image 378
peter.murray.rust Avatar asked Nov 02 '12 02:11

peter.murray.rust


4 Answers

I do not think that there is such translation table available at all.

It looks to me that MathematicalPI font family is a synthetic one, which has been created ad hoc by selecting a subset of elements from some larger unknown set. The raison d'être of MathematicalPI font family seems to be the representation of simple algebraic operators (plus, minus, multiplication, division) and the equals sign. The charnames (ie. H1100X) appear to be artifacts, because they are not ordered after codepoint values (eg. the equals sign is the last one).

By looking at the available data, I can suggest that the missing H11004 charname should correspond to the division operator. However, it is impossible to predict if it should be represented by the Unicode "solidus" character (ie. U+002F), "division sign" character (ie. U+00F7), or something else.

like image 153
user1808924 Avatar answered Oct 05 '22 12:10

user1808924


Here's what I published on the Adobe Forums site:

I could be wrong, but I don't think there's an official correspondence table.

Using the six Type 1 fonts and the OpenType font that was made out of them, I've assembled two PDFs which show all the glyphs. Next to them are the glyph names (for the Type 1 fonts) and the Unicode value(s) (for the OpenType font). If you cross reference these two PDFs, you should be able to assemble the correlation list you're looking for.

Mathematical Pi

Hope this helps.

Miguel

like image 27
Miguel Sousa Avatar answered Oct 05 '22 11:10

Miguel Sousa


Here is the best information as provided by Miguel Sousa of Adobe in his Typography forum message there:

  • Mathematical Pi 1-6 PDF / Mathematical Pi 1-6 InDesign IDML
  • Mathematical Pi Std PDF / Mathematical Pi Std IDML

For what it's worth and to summarize information that I had added in comments on this answer, here is what I was able to find before and apart from that.

Michael Sharpe, creator of package "mathalfa" at CTAN and member of UCSD mathematics, has TeX definitions for Mathematical Pi in this archive file. I successfully guessed that the obsolete documented location at me.com has moved to his university site. The ".vf" files map the characters of Mathematical Pi to TeX math codepoints. They are binary. The mapping data is part of the dump to readable text using the tool "vftovp" that is part of TeX distributions. After performing that dump, we find that the mapped characters are:

mathpibb: 'hyphen-minus' 0-9 A-Z a-z
mathpical: percent 'hyphen-minus' A-Z
mathpifrak: 'hyphen-minus' 0-9 A-Z a-z
mh2s: A-Z

So that explains the package name "mathalfa". He took on only the task of employing the alphabetics and digits but hardly anything more. We must look at the files above for mappings for the symbols.

I think that parts of MathPi, such as the Greek letters of MathPi 1, use the same encoding as Adobe Symbol, which is documented here: http://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt

When attempting to map symbols to Unicode oneself, a good way to find the Unicode point is by drawing the glyph on the screen here: http://shapecatcher.com

like image 43
minopret Avatar answered Oct 05 '22 11:10

minopret


FWIW my current mapping table (from reading documents created using MathPI, is:

<codePoint name="H9251" unicode="U+03B1" unicodeName="GREEK LOWERCASE LETTER ALPHA"/>
<codePoint name="H9252" unicode="U+03B2" unicodeName="GREEK LOWERCASE LETTER BETA"/>
<codePoint name="H9253" unicode="U+03B3" unicodeName="GREEK SMALL LETTER GAMMA"/>
<codePoint name="H9254" unicode="U+03B4" unicodeName="GREEK SMALL LETTER DELTA"/>
<codePoint name="H9255" unicode="U+03B5" unicodeName="GREEK SMALL LETTER EPSILON"/>
<codePoint name="H9256" unicode="U+03B6" unicodeName="GREEK SMALL LETTER ZETA"/>
<codePoint name="H9257" unicode="U+03B7" unicodeName="GREEK SMALL LETTER ETA"/>
<codePoint name="H9258" unicode="U+03B8" unicodeName="GREEK SMALL LETTER THETA"/>
<codePoint name="H9259" unicode="U+03B9" unicodeName="GREEK SMALL LETTER IOTA"/>
<codePoint name="H9260" unicode="U+03BA" unicodeName="GREEK SMALL LETTER KAPPA"/>
<codePoint name="H9261" unicode="U+03BB" unicodeName="GREEK SMALL LETTER LAMBDA"/>
<codePoint name="H9262" unicode="U+03BC" unicodeName="GREEK LOWERCASE LETTER MU"/>

<codePoint name="H11001" unicode="U+002B" decimal="43" unicodeName="PLUS"/>
<codePoint name="H11002" unicode="U+002D" decimal="45" unicodeName="MINUS"/>
<codePoint name="H11003" unicode="U+00D7" decimal="215" unicodeName="MULTIPLICATION SIGN"/> 
<codePoint name="H11005" unicode="U+003D" decimal="61" unicodeName="EQUALS"/> 
<codePoint name="H11011" unicode="U+007E" decimal="126" unicodeName="TILDE"/> 
<codePoint name="H11021" unicode="U+003C" decimal="60" unicodeName="LESS" htmlName="lt"/> 
<codePoint name="H11022" unicode="U+003E" decimal="62" unicodeName="" htmlName="gt"/> 
<codePoint name="H11032" unicode="U+0027" decimal="39" unicodeName="APOSTROPHE" htmlName="apos"/> 
<codePoint name="H11034" unicode="U+00B0" decimal="176" unicodeName="DEGREE SIGN" htmlName="deg"/> 

<codePoint name="H11554" unicode="U+00B7" decimal="183" unicodeName="MIDDLE DOT"/> 

like image 45
peter.murray.rust Avatar answered Oct 05 '22 11:10

peter.murray.rust