Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why letter 'f' oftentimes cannot be copied from text in pdf files?

Tags:

text

copy

pdf

I am not sure if this question qualifies here, but it seems odd to me letter 'f' often get messed up when copied from pdf text.

I do research as a student, and I read a lot of papers. This happens a lot when I want to copy the name of a paper to rename the pdf file.

For example, I opened the link a paper from built-in pdf display plug-in of Chrome on a Macbook Pro with OSX 10.9. Try copy the title of the paper and paste it. The 'f' in 'fluids' will be missing.

like image 414
warriormole Avatar asked Dec 12 '13 05:12

warriormole


1 Answers

Not only the "f" will be missing, the "fl" will.

The reason for this are so-called "ligatures". In order to look nice, some combinations of letters, most notably fi, get combined into a single character. The special character is rarely treated correctly when copy-pasting. You can see this below. If you try to select the ligature, you will notice it is only one "letter". Note that your computer may render the two separate letters by using the ligature.

The following is a "fi" ligature: fi
The following is two letters: f‌i

Especially visible in a fixed-width font:

The following is a "fi" ligature: fi
The following is two letters:     f‌i
like image 76
Jan Schejbal Avatar answered Oct 28 '22 23:10

Jan Schejbal