Converting BibleWorks Hebrew

We recently announced that we’ll support importing BibleWorks notes into Logos 7. This is mostly a matter of converting RTF into our internal format, a fairly well-understood process.

The one wrinkle is supporting BibleWorks Greek and Hebrew fonts. BibleWorks didn’t support Unicode for many years; instead it used the Bwhebl and Bwgrkn 8-bit fonts to simulate Greek and Hebrew characters.

In Unicode, b and β and ב are all separate characters (that in some cases are all supported by a single font). With 8-bit fonts, one uses Latin characters (a, b, c) but changes the font so that “b” looks like β or ב. Behind-the-scenes, κύριος is stored as ku,rioj and אֲדֹנָ֣י as yn’ådoa}. This makes text processing more difficult as you can no longer perform a search for Greek or Hebrew using the Unicode versions of those characters. It also means the user must have these specific fonts installed and can’t change their preferred Greek or Hebrew font. For a good customer experience, we needed to convert the characters for users who had BibleWorks notes predating Unicode support.

The Greek was relatively straightforward, but Hebrew presented a bigger challenge. Not only is BibleWorks Hebrew stored using Latin characters, it’s also stored in display (i.e., left-to-right) order. In Unicode, characters are stored in logical order (which is right-to-left for fragments of Hebrew text); the display system will lay them out correctly. The string needs to be reversed, but with a catch: in both BibleWorks Hebrew and in Unicode, Hebrew vowels and accents are entered after the character that they’re positioned on top of. We can’t naively reverse the entire string; we have to reverse it one grapheme cluster at a time.

Moreover, Unicode has a concept of bidirectional mirroring in which “neutral” characters are replaced by their mirrored versions in a RTL run; for example ( will be displayed as ) in right-to-left text. When reversing the string, these characters need to be replaced by their mirrored version.

Finally, the documentation we found gave BibleWorks Hebrew characters as decimal numbers representing entries in an 8-bit font; due to the way we were reading the RTF source of BW Notes, these bytes had already gone through a Windows-1252 to Unicode conversion, so our character map had to be based off the Unicode characters that corresponded to Windows 1252 bytes.

Initial input 191 121 110 39 229 100 111 97 125 192
Decode Windows-1252 to Unicode ¿ y n å d o a } À
Untransliterate ( י נ ָ ֣ ד ֹ א ֲ )
Reverse grapheme clusters ) א ֲ ד ֹ נ ָ ֣ י (
Flip punctuation ( א ֲ ד ֹ נ ָ ֣ י )

The final result: (אֲדֹנָ֣י)

Our complete BibleWorks Hebrew mapping table is available here.

Posted by Bradley Grainger on August 15, 2018