From quite some time ago, but still very relevant as encoding issues keep occurring:
A while ago, I saw the text “v3/43/4r” in a document.I know it comes from “vóór” (the acute accent emphasises in Dutch), and wonder which encoding failure was applied to get this wrong.
Source: [WayBack] Which encoding failure did encode “vóór” into “v3/43/4r”? – Stack Overflow
From the [WayBack] answer by rodrigo:
- ó: is U+00F3, and occupies the same codepoint (0xF3) in a lot of different encodings (most ISO-8859-* and most western Windows-*).
- In CP850 the codepint 0xF3 is ¾ (U+00BE), that is the three-quarters character. It is the same in other, less used, codepages (CP775, CP856, CP857, CP858).
- The ¾ is sometimes transliterated to 3/4 when the character is not directly available.
And there you are! “vóór” -> “v¾¾r” -> “v3/43/4r”.
The first part (ó -> ¾) is the usual corruption of ANSI vs. OEM codepages in the Western Windows versions (in my country ANSI=Windows-1252, OEM=CP850). You can see it easily creating a file with NOTEPAD, writing
vóór
and dumping it in a command prompt withtype
.
–jeroen