Opentype feature access to unusual characters

Nick Shinn's picture

Here are a few ideas:

Superior:
® becomes "registered.alt" -- a small, superior version of the character
† becomes "dagger.alt" -- a small, superior version of the character
‡ becomes "daggerdbl.alt" -- a small, superior version of the character

Stylistic alternates:
' becomes "second"
" becomes "minute"
x becomes "multiply"
- becomes "minus"
l becomes "liter" (u+2113)
> becomes "fist" (u+261E)

These seem reasonably intuitive.
Is there any reason not to implement them?
Anything else that could be added?

John Hudson's picture

I think the superior ideas are fine. There are quite a lot of characters that may take superior forms in some kinds of publications, and most do not have dedicated superior encodings in Unicode. So using the 'sups' layout feature is the obvious and legitimate way to access them.

But I disagree with your proposed stylistic alternates, because all the target glyphs do have their own Unicode encodings, and if someone wants those specific forms then they should use the appropriate characters, not disguise other characters as them. The fact that lots of people type x instead of × doesn't mean that the latter is a stylistic variant of the former; it just means that those people are using the wrong character. And who says that the fist is a stylistic variant of >? why not a stylistic variant of •? Thankfully, we don't need to play such guessing games, because we have ☞

Nick Shinn's picture

I agree with your criticism in principle, but in practice it's very hard to find unusual characters in the glyph palette, especially in OT fonts with 1000s of characters. It always used to bug me looking for the "second" and "minute" in the Symbol font, with only 100 or so.

Possible negative consequence of this stylistic alternates proposal: a document with a "liter" symbol is set in a font using the "el-to-liter" feature, but then reset in a font that just sets the plain "l"; or the fist comes out as >; or multiply reverts to x, etc. Is that so terrible? Surely the meaning is intact, although some typographic subtlety is lost.

Miguel Sousa's picture

I agree with John.

Many of Adobe's early Pro fonts have similar kinds of substitutions (e.g. in Warnock Pro the ornaments are alternates of lowercase Latin letters), but this is not really a good practice*. These and other substitutions (like sub [a A] by ordfeminine;) were implemented in the font to make its usage convenient from the user's POV, but they're not good because they replace character(s) by other character(s). On the other hand, a substitution like sub a by a.superior; is perfectly legitimate, as long as 'a.superior' is left uncoded. (FWIW, a superior variant of the lowercase 'a' is not present in Unicode; although 'a.superior' and 'ordfeminine' U+00AA might look alike and even share the same glyph form, they are semantically different)

The bottom line is, OpenType layout (OTL) should not be involved in character transformations. The user is the one responsible for encoding his/her message appropriately, and this means using the correct codepoints, even if these are not readily accessible from the keyboard. The function of OTL is to style the message/content, not to alter it.

* Unlike the 'fist', which is encoded in Unicode, ornaments are an interesting case because they don't have their own codepoint, so how should we access them/provide access to them? The current practice we're using is to treat them as alternates of the 'bullet'. This treatment does not apply to dingbats and other symbols such as arrows, since these generally have their own Unicode codepoints.

charles ellertson's picture

Nick,

For a comp anyway, the best way to enter such characters in InDesign is to enter the the Unicode number, not by using the Glyph Pallet. Using the Glyph Pallet puts all sorts of extra code in the file, which may or may not block other actions, like ligaturing.

More important is that you kern your minutes/seconds characters (aka prime/doubleprime) properly with all the numbers, and the period (check an ISBN format)

I suppose less and less composition is being done by compositors & more & more by graphic designers. You know your market. But to take this another step back & let composition fall to the font designer seems fraught with peril.

John Hudson's picture

Nick, remember the old adage that typography should serve the text? Let me propose a new one: digital typography should serve digital text. One of the implications of this is that the default glyph for a given character should not pretend to be a different character. Digital typography and text processing present us with a complex set of relationships between glyphs and characters, and we shouldn't make it any more complicated and we should avoid situations in which we basically use glyphs to lie about the identity of the underlying character.

We've fairly recently come out of a period in which designers became used to using encoding hacks to get at the glyph shapes they wanted (e.g. 'expert sets' for ligatures). These hacks were necessary because of the lack of a suitable mechanism for cleanly weaving display and text encoding. Now, with Unicode and OpenType, we have such a mechanism, but we also have a lot of possibilities for using glyph hacks to get at the glyph shapes of characters that might be less convenient to enter in text (despite the profusion of input methods, character and glyph pallettes, custom keyboard drivers, etc.). I understand the motivation, but I think it obscures exactly what we have gained in the Unicode/OpenType model -- a way to separate the sophisticated display of text from the encoding of text -- and that gain is best appreciated, and not needlessly muddied, if one observes some basic principles about character identity.

k.l.'s picture

Hello Charles, in ISBN I find numerals with hyphen mostly, do you mean kerning period with minutes/seconds, or period with numberals? Good tip to enter the Unicode number in InDesign, I was not aware that this is possible.

Hello Nick, to second what Miguel says: In very early versions of my fonts, 'ornm' substituted letters by arrows. Soon I understood why Adobe discontinued this practice, and remove this kind of substitution. Funny effect was that when I applied the new font versions on text originally set with previous ones, arrows turned into As & Bs again, and I didn't even recognize at first: letters in a stream of letters ... Same would happen if someone changes fonts that do have such fatures to fonts that don't. Such sudden changes may go unnoticed in our fast world.

I agree with your criticism in principle, but in practice it’s very hard to find unusual characters in the glyph palette, especially in OT fonts with 1000s of characters.

Finding a particular glyph in a large font will always be a 'challenge', but sorting glyphs nicely make help a little, like, a group for uppercase, smallcaps, lowercase, punctuation, numerals, numeral-related punctuation, etc.

Nick Shinn's picture

sorting glyphs nicely

How do you do that?
Any good models to follow?

paul d hunt's picture

Any good models to follow?

set your font to unicode mode and then go to gylphs > sort glyphs > by encoding. then switch to index mode and you can drag things around to where you'd like them to be. once you have everything set, you can save a custom encoding, next time you want to sort glyphs, set font to names mode, select your custom encoding and then go to glyphs > sort glyphs > by encoding. if there are any characters that are outside this encoding, you can again switch to index mode and move glyphs around as you like.

k.l.'s picture

And don't forget to switch the OpenType export option "automatically reorder (or was it sort?) glyphs" off.

Good model -- hard to say. I think the order in the CT fonts is quite reasonable. Mine is a bit different. I am not aware of other examples and would be interested to see some.
By the way, 'nice' sorting is even possible when using the AFDKO. Just sort the content of the GOAADB file as you like.

Miguel Sousa's picture

Regarding glyph-sorting on the Glyph palette, I'll take this opportunity to point out that InDesign CS3 allows the glyphs to be sorted by Unicode*, in addition to the usual GID order.

The images below display the same font sorted by GID (top) and by Unicode (bottom). The new sorting method might be more intuitive for some people, as it puts together all the things that are somehow related with each other.

* Unencoded glyphs, such as alternates and ligatures, are sorted as well.


kentlew's picture

KL: Hello Charles, in ISBN I find numerals with hyphen mostly, do you mean kerning period with minutes/seconds, or period with numberals?

Karsten, I think Charles meant to reference the LOC CIP listing, not ISBN. In the CIP data (formally: Library of Congress Cataloging-in-Publication Data), there is a catalog string that frequently uses a prime (very often mistakenly converted to an apostrophe by inattentive comps) and which may occur in conjunction with a period (although, off-hand, I can't think of an instance where I've encountered this).

Charles, did I get that right?

-- K.

Nick Shinn's picture

Miguel, if I want to have a category appear in the InD glyph palette "show" pull-down menu, named "Mathematical symbols", and another named "Miscellaneous symbols", how do I create that in the font?

Miguel Sousa's picture

> if I want to have a category appear in the InD glyph palette “show” pull-down menu, [...] how do I create that in the font?

You can't. InDesign builds these categories, based on the font's character set and OpenType features included in it (top pic). However, in CS3 the user can make his/her own set(s) (bottom pic), which will appear on that drop-down menu.


k.l.'s picture

KL -- I think Charles meant to reference the LOC CIP listing, not ISBN.

I see. I had to search through a couple of books to finally find this in some IUP titles, indeed one of them showing an apostrophe, and in another the minute sign is followed by a period. Many thanks!

John Hudson's picture

Nick, regarding organising glyph sets in intuitive ways to assist users of glyph pallette entry:

Pretty much every project I do begins with the definition of a glyph set, usually in an Excel spreadsheet. This will contain at minimum a list of my development (human friendly) glyph names and corresponding Adobe Glyph List and uniXXXX form final names. I often include other information: if for instance I am working on a script for which I do not already have a FontLab .nam names-to-Unicode mapping, I will include encoding information in the spreadsheet. Obviously, after one has done a number of projects one has a collection of spreadsheets which can easily be manipulated to build new glyph sets. The listing of glyphs names can be copied into FontLab .enc files to create custom 'encodings', i.e. glyph sets in FontLab.

Using this approach, I group like-with-like glyphs in ways that, I hope, will be intuitive for locating and selecting from glyph pallettes.

charles ellertson's picture

Kent, Karsten, you're right -- My mistake. For books published in the states, perhaps as a remnant of times when library sales were important to university preses, there was a block of copy furnished by the Library of Congress which was usually printed on the Copyright page. If it was in the book when printed, and in the same form as provided by LOC, the libraries could, in theory, get the books on the shelves faster.

The content of the copy was inviolate. The form was suppose to be followed as well, but Richard Eckersley (designer of The Telephone Book) started a bit of a revolution over that one, so now the form is often more in keeping with the rest of the interior design.

Anyway, the number I was talking about, my wife informs me, is (she thinks) part of the Dewey Decimal system, and she further informs me that the prime seems to be absent from the newest material.

Edit:

For an example: Retreat from Gettysburg by Kent Masterson Brown, University of North Carolina Press, ISBN 0-8078-2921-8 & "funny little number" 973.7'349--dc22 (& BTW, my Janson). I have seen the period "interact" with the prime, but don't have one to hand.
. . .

You can enter a Unicode character in ID2 running under Windows XC. As I sit at home typing on a Mac with an old keyboard, I'll probably make a mistake here too, but I believe if you hold down the shift+alt keys, and use the numeric key pad for any numbers, you'll get the Unicode character. There is a way using MS Word, too (again, Windows).

paul d hunt's picture

i'd like to hear Adam Twardoch's take on this issue. Adam, are you out there?

twardoch's picture

> ’ becomes “second”
> ” becomes “minute”

You mean: ’ becomes ″(which means "second") and ” becomes ′(which means "minute")???

To me, it makes no sense whatsoever. Why would one stroke be replaced by two strokes, and two strokes be replaced by one stroke?

Well, I know, you made a typo and meant this to be the other way around, but it actually shows very well that your concept is flawed. It defeats WYSIWYG, it defeats Unicode.

I definitely recommend AGAINST "codepoint hacking", i.e. assigning characters that have their own Unicode codepoints as glyph alternates of characters that have other Unicode codepoints, and are only mildly related by appearance.

The problem is that you invalidate the principle of "text accessibility". If you lie about the semantic level of text, i.e. you use a hack encoding to represent the litre symbol using the letter "l", then texts encoded electronically using Unicode lose their primary advantage, namely that the encoding of the text is *predictable*. Which means that one can cut-and-paste the text into a different software and he will get an approximation that will be as good as possible, even set in a different font. If the font is not available, the replacement will at least not be awkward.

What if someone (a disabled person) uses a screen reading software? This software does not "see" the glyphs, it can only read the character encoding out loud. The screen reader software would read "two ex three" instead of "two times three", and "two hyphen five" instead of "two minus five".

What if somebody is using search-and-replace in the "naiive" hope that what he sees as being a minute or second symbol actually is encoded as such? What if in half of the book typeset using your font the symbol has been typed in using the proper Unicode, and in the other half it was typed in using the hack Unicode with the feature applied? What if then, someone needs to do some text transformation, like global search-and-replace?

I mean, theoretically, you could put code in your font such as

feature calt {
sub quotedbl' [A-Z a-z] by quotedblleft;
sub [A-Z a-z] quotedbl' by quotedblright;
} calt;

but this is just a nasty hack. The encoding starts to be font-dependant again, and we are back to times where you had to type in "M" to get a "fi" ligature.

Remember old typewriters on which there was no key for digit "1" because the user was supposed to type in lowercase "l" instead? My mom had one like that. Well, why, it worked fine back then. These two glyphs look as much alike as ″ and ”.

Adam

Nick Shinn's picture

So I take it you're no fan of Smarty Pants?

Or similar "curly-quote" hacks in Quark and inDesign?

If those guys can get away with that kind of crap, why shouldn't I? :-)

paul d hunt's picture

i understand that it's all about respecting the character input string, so we should just design fonts so as not to allow for lazy input? is a string of 1/8 really preferable to ⅛ when the latter is what is actually intended? I guess that's the thing though, is that we can't really guess what the user is intending, eh? we'd have a pretty good idea, however, if the user types 1/8, highlights the text, and then switches on the 'frac' feature. hmmm, but that seems just as much work as just selecting the appropriate glyph from the palette in the first place. i guess i just needed to post so as to talk myself into doing things the "correct" way. thanks for all your input everyone, a great discussion.

John Hudson's picture

Nick, when curly quotes are automatically substitued for ' and " keystrokes in InDesign etc. this is a character conversion. When you enter "QUOTE" what ends up in the text string are the curly quote characters U+201C and U+201D, not the keyboard entered straight quote U+0022. Essentially, the application providing users with an input method editor for typographical quote mark characters. This is a very different scenario from what you are proposing, which is a glyph conversion.

John Hudson's picture

Paul, fractions are an interesting case. A fairly small subset of common fractions happens to be encoded in Unicode as precomposed characters, e.g. ⅛; so far as I can gather, most of these encodings are for backwards compatibility. But fractions are by their nature generative and one very quickly exceeds that set of common fractions if one starts doing things with them: a fraction is, after all, a mathematical expression, a way of writing a division operation. So the logical way to encode fractions is as numerals separated by an operator: 1/8.

k.l.'s picture

N.S. -- So I take it you’re no fan of Smarty Pants?
Or similar “curly-quote” hacks in Quark and inDesign?

On application level, yes, this is typing assistance.
On font level, no. After all, fonts are databases providing applications with glyph outlines, and today may also include alterations of the outlines (substitution, repositioning).

J.H. -- But fractions are by their nature generative and one very quickly exceeds that set of common fractions if one starts doing things with them

I like this description.
A similar case is fi/fl. If you design a sanserif typeface which does not need ligatures, you still add fi/fl for backward compatibility (MacOS Roman) but do not cover them in 'liga'. And hope nobody would select them from the glyph palette.

(Were nice if a flag would allow type designers to hide certain glyphs from the palette.)

twardoch's picture

"Smarty Pants" are an extension of the keyboard driver -- assisting the user in typing in the correct Unicode. That's perfectly fine, because what ends up in the electronic text is Unicode-compatible, i.e. is what everybody expects.

The advantage of a standard is that it is a standard. Unicode has great chances to become THE standard to encode human writing (except perhaps for CJK languages that have some alternatives).

The WYSIWYG principle (What You See Is What You Get) these days is not only limited to "what you see on screen is what you get on the printer". It also means "what you see on screen is what you get when you copy-paste or search". This is why Adobe-compliant glyph names are important, and this is why adherence to the Unicode standard is important.

Adam

Nick Shinn's picture

OK, more devil's advocacy:

Standards aren't the be-all and end-all. They're just a means to an end, and sometimes the means can screw up the end.

I appreciate the breadth of John and Adam's concern for the integrity of the text across different media, but this may sacrifice initial typographic quality, and that seems absurd in, for instance a print piece where the text has little likelihood of being digitally searched, or reset.

the old adage that typography should serve the text?

I'm not sure how old that one is, although it does kick off Bringhurst's Style Guide ("honour" the text, actually). The older adage is serving the reader.

As Charles mentioned, knowing one's market (or audience) is important, and I wonder how many typographers who might be inclined to use the special characters I mentioned, would be comfortable entering unicode numbers? Not many, I would say -- they're going to use the glyph palette, now clutttered up with the wonders of massive Unicoded OT fonts. Sure, the filing systems described by Miguel will help, but in practice it's next to impossible to distinguish an apostrophe from a minute from an acute from a tonos.

I am always amazed to come across typographic boo-boos in communications from blue-chip companies. Never mind that, in its printed missives to the design industry, Veer routinely abbreviates numbers with left-quotes, following the InDesign/SmartyPants practice. Good WYSIWYG, but poor typography.

Surely the best way to encourage typographers to produce more sophisticated typography -- to serve the reading public, not the great god WYSIWYG -- is to make it really easy for them to, for instance, set the multiply sign, by simply applying the "Stylistic Alternate" feature to the character that's already in the text? (It would certainly have been easier for me to do something like that earlier in this thread, where I got minute and second muddled.)

And despite Unicode's assertion that x and multiply are different characters, visually they really are stylistic alternates of the same character shape, and, in the context of any text where a faux Stylistic Alternate substitution would be made, there could never be any problem with meaning. Even in proper maths, because an italic font would be used for "x", precisely because of the interchangeability of "x" and "multiply".

Consider this hypothetical scenario: an InDesigner is setting the captions for an art catalogue, and, for the artwork dimensions, wants to insert the correct multiply symbol in place of the manuscript's "x", throughout the text. Global substitution won't work, as that would change all "x"s, so "find/change" might be used. Except that the glyph palette doesn't connect to the "change to" field, and "multiply" isn't one of the characters available in the list of suitable substitutions (nice touch Adobe, BTW). So the InDesigner must do the job manually. I would be inclined to make the change once, then copy and paste. But how to make the first change from x to multiply? The glyph palette, of course, but if the font has the "salt hack" I'm proposing, that would be easier. And it would also work to convert the hash marks that measure the painting sizes in inches, into the proper inch mark (second).

In the most recent art catalogue I have (Contact photo festival), the designer has used all caps of a sans face (X really does look like multiply), and spelled out inches -- I would guess the difficulty of getting the proper characters/symbols has something to do with that decision.

John Hudson's picture

Nick: Consider this hypothetical scenario...

Not so hypothetical: this is the sort of thing I do on a regular basis. It's a pity that the glyph pallette doesn't accept focus to the find/replace dialogue, but this is easy enough to workaround. Make one manual substitution and then copy/paste the multiply symbol into the replace field of the dialogue and away you go.

What would make it easier? If the find/replace dialogue had context fields! Then you could say, e.g. replace x with × when preceded and followed by any numeral. I think context fields should be standard in find/replace functions, and am amazed that no one seems to have thought of this.

But the basic point, Nick, is this: what you are complaining about is application functionality. And I don't think it makes any sense to try to solve application limitations in fonts. It is something that is very tempting to do sometimes, especially if it seems like there is an easy way to do it, but having spent the past decade working closely with application developers I've come to the conclusion that it is a really bad idea to second-guess them. Chances are that the way the application works will change -- hopefully improve -- and you are left with a quirky font that does odd things and a pile of poorly encoded text.

Nick Shinn's picture

replace x with × when preceded and followed by any numeral

I had thought of doing that as Contextual Alternate, but it could screw up product code numbers. But yes, context fields in Find/Change, good idea!

what you are complaining about is application functionality.

It would be good if multiply and minus were on the keyboard too, like plus and equals.
And curly quotes.

I don’t think it makes any sense to try to solve application limitations in fonts.

It can be useful. The alternate glyphs in the case feature, for instance, solve an application problem (manual application of baseline shift) by automatically repositioning punctuation such as parentheses.

**

How about if instead of substituting "Multiply", applying Stylistic Alternate to "x" substitutes "x.alt" -- which has the same glyph as Multiply?

That way it doesn't mess with the Unicode.
And whereas the original manuscript was a "lie", in that it tried to pass off "x" for Multiply, the substitution of the correct glyph is true to the text's meaning.

charles ellertson's picture

As Charles mentioned . . .

Just to keep things clear, I think this a bad idea. You've had lots of good ones, but I don't feel this is one of them. If someone doesn't know how to set the needed character(s), they aren't a typesetter, let alone a compositor. No effort by a typefounder can change that, and will often get in the way of those who do know what they are doing.

FWIW

Miguel Sousa's picture

Nick: but in practice it’s next to impossible to distinguish an apostrophe from a minute from an acute from a tonos.

That's what the Glyph palette's tooltip is for.

John: What would make it easier? If the find/replace dialogue had context fields! Then you could say, e.g. replace x with × when preceded and followed by any numeral.

You can do that with GREP, in InDesign CS3. The Find/Change expressions would be:
Find what: (\d)x(\d)
Change to: $1×$2

Nick: How about if instead of substituting “Multiply”, applying Stylistic Alternate to “x” substitutes “x.alt” — which has the same glyph as Multiply?

That is a "better" approach, but it does not solve anything. The underlying character will still be 'x', therefore any Search/Replace or screen reader software will always be fooled by that. They "see" characters, not glyphs.

Nick Shinn's picture

Miguel: I don't get the Name line -- but then, I'm using CS (not CS2 or CS3) -- is that a later addition?

That is a “better” approach, but it does not solve anything.

I think it does for print, although the screen reader will always be problematic.
BTW how does the screen reader handle "second"? -- it's going to get it wrong if it's supposed to be inches or degree of arc! Those should have separate Unicode points...

Adam: What if in half of the book typeset using your font the symbol has been typed in using the proper Unicode, and in the other half it was typed in using the hack Unicode with the feature applied?

I see what you mean. Or if a typographer has already been through the text and changed all the "x"s to faux "multiply", using my suggested Stylistic Alternate feature, and then the font is changed to one without such a feature, it would be a pain to have to do it all over again with the correct Unicode procedure.

So I'm conviced it's a *bad idea* and won't do it in any fonts.

Charles: If someone doesn’t know how to set the needed character(s), they aren’t a typesetter, let alone a compositor.

I'm floating the idea of making things more convenient for people who do know, and more accessible for those on the cusp. And those still using CS who aren't getting the TypeTip names.

You know, I posted this same thread in the general section and got one response (plus Adam). I just get the feeling that we're so into all these OpenType complexities, ramping up the characters and features in fonts, while the OT menu remains buried and unheralded in CS and Quark (not to mention Microsoft) -- and meanwhile the vast majority of users think OpenType is just for fancy script ligatures.

We need more Ilenes spreading the news!
http://www.fonts.com/AboutFonts/Articles/fyti/06-01-05.htm

Miguel Sousa's picture

> I don’t get the Name line — but then, I’m using CS (not CS2 or CS3) — is that a later addition?

Not sure when it got added, but it's definitely in CS2. It got improved in CS3, where the "Name:" entry was added. The tooltip is also displayed for alternate glyphs and ligatures. In these cases, the Unicode displayed is the one of the base character(s).

> BTW how does the screen reader handle “second”? — it’s going to get it wrong if it’s supposed to be inches or degree of arc! Those should have separate Unicode points…

I'm not sure how the reader will make the distinction between minutes and feet, or between seconds and inches, given that there's only one codepoint for each. Unicode calls them 'prime' and 'double prime'. http://www.unicode.org/charts/PDF/U2000.pdf

twardoch's picture

> X really does look like multiply

In some fonts X really looks like multiply, and in some fonts l really looks like 1. In some fonts, rn looks like m. So what? Would you put this code into your font:

feature liga {
sub r n by m;
} liga;

? Makes no sense whatsoever.

Character encoding must bear necessary semantic differentiation, because a lot of software depends on it. For example, in Unicode, some characters have the property of being "letters" while some don’t, some have the property of having uppercase or lowercase variants, they belong to different writing systems etc.

All this information gets screwed up if you start making glyph alternates the way you suggest it. This interfers with various mechanisms such as spellchecking, hyphenation, the way OpenType features are being applied to a string or not.

The solutions you propose are OK for fonts that you produce on a completely custom basis, for very controlled, closed environment, where you maintain contact with the client in future. But never for retail fonts, because despite what you may write in your user’s manual for the font, people will use the font their way, may use it on systems and applications that you never heard that they existed, and, above all, may use them *according* to the standards that you choose to ignore.

This reminds me of the situation when people started producing Cyrillic or CE fonts in Type 1 format in the early 1990s. They simply placed the Cyrillic glyphs, or the accented Polish or Czech glyphs, in the slots of Western glyphs (using the fact that one code position in the Western codepage stood for some particular character in the Cyrillic or CE codepage). They didn’t care to set up appropriate glyph names and encoding flags. Those fonts worked fine in QuarkXPress 3 and 4, and in PageMaker, and in Illustrator 6, and in Photoshop 4.

But these fonts are STILL in circulation. 15 years later, people moved to a Unicode-based system. When they open old Quark documents in Quark 7, the automatic conversion from codepage-based encodings to Unicode fails because Quark "thinks" the document is set in French and not in Russian, so the user ends up with character spaghetti. Also, these fonts are virtually unusable in Quark 7 or InDesign. And the users curse the font vendor.

On the other hand, there are Type 1 fonts from other font vendors that were created by the rules. They worked just exactly as fine in the old apps, *and* 15 years later they still work, and the documents created with them also do work.

If you make the fonts "your way", they may work fine in InDesign CS3 but perhaps will stop working properly in InDesign CS5, because Adobe would have introduced a mathematical typesetting engine that correctly enters and formats math equations, but this engine might rely on arithmetic operators being encoded *properly*. So if the user opens an old document typeset in your font that says "5×6", the math engine will recognize this as an equation and add some extra spacing around the × sign accordingly, but if the text in the document looks like
"5×6" but actually says "5x6", then the math engine may reformat it to "5x6" because it thinks the "x" is a variable name.

Etc. etc.

The advantage of using standards is that they offer much larger future stability. I.e. even if the standards change or get replaced, there is a great chance that the people who develop the new or the modified standards in future will also develop standard ways of migrating the old structures to the new ones. If you deliberately break today's standards, it's almost certain that this will break the future migration, either of the fonts or of the documents that the font was used in. In any way, your users will be penalized -- either today (those who use your font on platforms that you didn’t bother testing, or using them in a context that you didn’t envision or that you deemed "unimportant") or tomorrow.

I personally would always warn users from purchasing hack-encoded fonts. The graphic qualities of such fonts may be fine but the functional side of the design is broken. I'm not interested in wearing a wrist watch that looks great but every now and then randomly stops working for 15 seconds or jumps ahead one minute. The same is with fonts.

A.

cuttlefish's picture

This sort of substitution really should be a setting option in a layout program, but yes, never scripted within the font its self. A little tick on the style bar to indicate you're setting math, and even then * and / keystrokes should be substituted with multiply and divide (already standard for keying in those math functions), respectively, to avoid ambiguity btweeen multiply and x.

Nick Shinn's picture

Adam, I already capitulated.

Regarding backwards compatability, just follow the money.
You can open a Quark 3 document on an Intel Mac, but only if you buy Quark 7. Don't need new fonts though, thanks to the stellar philanthropy of type foundries!

Don't you think that's a bit of a double standard?

After all, Quark, InDesign and Word have incorporated faux features (bogus weight, italicization, scaling, small caps) since day one.
To paraphrase; the functional qualities of such software may be fine, but the typographic side of the design is broken.

John states that fonts shouldn't attempt to fix application shortcomings with hacks, but the converse is exactly what apps do with faux features.

John Hudson's picture

Nick: It can be useful. The alternate glyphs in the case feature, for instance, solve an application problem (manual application of baseline shift) by automatically repositioning punctuation such as parentheses.

That assumes that the effect of case feature substitutions is always equivalent to a baseline shift, but this is not so. case feature substitutions may involve changes to the form of glyphs as well as to their vertical positioning, not to mention spacing. In fonts in which the default numeral forms are oldstyle, it makes sense to put mappings to lining numerals in the case feature. So the case feature is not simply doing in the font what could be done at the application level: it is things that are glyph centric and need to be controlled at the font level.

John states that fonts shouldn’t attempt to fix application shortcomings with hacks, but the converse is exactly what apps do with faux features.

And I likewise think that they should not or, at least, should not do so automatically. If, when the user clicked the italic button in Word, for example, a dialogue popped up saying

'The font family you are using does not include an italic font. Do you want Word to slant the regular font to fake an italic?'

that would be preferable to automatic slanting.

Nick Shinn's picture

That assumes that the effect of case feature substitutions is always equivalent to a baseline shift...

That wasn't my assumption, although I did express myself sloppily. Although I didn't say "some of" the alternate glyphs, I didn't mean *all* of the substitutions are equivalent to baseline shifts.

I was just giving an example of how a font can solve a layout application difficulty.

And really, isn't that how OpenType features work? The substitution coding is in the font.
It would in theory be possible to just name the glyphs in a font, and have the app do the substitutions. For instance, with "one.numr", "fraction", and "nine.dnom" as font glyphs, an application fraction-maker could construct 11/99.

I would imagine that might be how Adam's putative CS5 math engine would work.

Nick Shinn's picture

Anyway, thanks for your comments, everybody.
I'm glad the extra "superior" glyphs passed muster -- are there any others that could be added, other than these footnote symbols?

twardoch's picture

BTW, the faux italicization really comes from the *extremely* early days of computing, where you had bitmap fonts, very few fonts, low-resolution printers etc. (where you wouldn’t even actually tell the roman from the italic, easily). Remember, those were the days where fonts came on floppy disk and using three different fonts on a page might overload your printer’s memory. And the expression "electronic documents" probably didn’t exist (neither did the PDF format).

We’re 20 years into future now. Yes, some apps still maintain the faux italic for backward-compatibility reasons. But there is no need or reason to come up with NEW solutions that still follow this old-style way of thinking.

A.

Nick Shinn's picture

...they should not or, at least, should not do so automatically.

The same is true of "SmartyPants", and "Typographic Quotes", which is a default of InDesign, isn't it?

Yes, some apps still maintain the faux italic for backward-compatibility reasons.

I don't think that's the main reason. It's because application engineers think their application can fill in the gaps in a font family, and they believe it's a genuine service to the user to do so. For instance, InDesign offers users small caps of any font, even those with no true small caps.

no need or reason to come up with NEW solutions that still follow this old-style way of thinking.

The need for Smarty-Pants: typographers hate seeing hash mark quotes and apostrophes, and if Smarty can fix that, they don't mind the hack, even if it screws up abbreviations like '07.

But as I said, your (and others') arguments against *lying* Stylistic Alternatives have convinced me that it would be impractical to include them in my fonts. Thank-you!

paul d hunt's picture

this is really a bit of a sticky wicket, isn't it? everyone seems to be in agreement that OT features should not be used for character conversion, but even the most recently released fonts by Adobe substitute scedilla with scommaaccent in the 'locl' feature. if we were following this non-conversion rule strictly, scedilla would be replaced by scedilla.locl (a clone of scommaaccent), right? So, is there just a bit of wiggle room with this rule, or should it be followed absolutely?

twardoch's picture

Paul,

the Scedilla->Scommaaccent case is indeed special. The proper Unicode codepoint to be used with Romanian is that of Scommaaccent (U+0218). Unfortunately, some old Unicode implementations use U+015E (Scedilla) in the Romanian locale. This means that the Romanian text can be represented in two different ways, "old" (using Scedilla) and "new" (using Scommaaccent). The "locl" OpenType replacement helps migrate the electronic text from the old to the new encoding. In this very case, the font helps fix a particular isolated encoding problem.

Also, the Scedilla->Scommaaccent replacement in "locl" for Romanian has become a widely accepted practice now. Both Unicode and OpenType work on that very principle: if something becomes widely accepted practice, it can be codified. So if you manage to convince enough people to follow a particular method, you may be lucky with your stuff getting "standardized".

A.

paul d hunt's picture

back to this topic...
would it be advisable to have some type of feature where for character composition (besides ccmp) so that the user would know that if this feature is enabled certain characters are converted easily to symbols? For example, many PC users are used to the automatic replacement of 1/2 to ½ and of (C) to ©. Would it be so bad to have a symbol composition feature (symb, hypothetically) that could build on these standardized (by Microsoft) practices to make symbols easily accessible via simple key combinations instead of having to hunt through a glyph palette?

twardoch's picture

Paul,

a conversion of (C) to © on the glyph level is rather bad because it breaks Unicode. Any OpenType features that break Unicode should be avoided unless there is a really strong argument to do it anyway.

Adam

John Hudson's picture

For example, many PC users are used to the automatic replacement of 1/2 to ½ and of (C) to ©.

But these are character-level substitutions performed by the application. So what ends up in the text string when (C) is coverted to © is © not (C). If this substitution were performed at the glyph level, then the text string would remain (C), which is not the copyright symbol.

Thomas Phinney's picture

Right. This is a particularly important issue for the copyright symbol, because the c-in-parens has no legal relevance. It has to either be the copyright symbol or the word "copyright."

Disclaimer: At least, that's what I've heard from lawyers. I'm not a lawyer myself and of course you should consult an actual lawyer for impartial legal advice.

Regards,

T

dtw's picture

...in addition to which, I often type (c) to mean (c), as in lists. When I want a copyright symbol, I know how to get it. Word used to substitute one for the other with its AutoCorrect settings, until I got fed up with it and switched it off. I wouldn't want that ‘feature' built into the font. Not that Word knows about OpenType features, but you get my drift...

____________________________________________________________
Ever since I chose to block pop-ups, my toaster's stopped working.

paul d hunt's picture

back to this, Miguel said:
Many of Adobe’s early Pro fonts have similar kinds of substitutions (e.g. in Warnock Pro the ornaments are alternates of lowercase Latin letters), but this is not really a good practice*. These and other substitutions (like sub [a A] by ordfeminine;) were implemented in the font to make its usage convenient from the user’s POV, but they’re not good because they replace character(s) by other character(s).

where do these replacements actually occur? i'm curious. i'm still trying to understand the workings behind the scenes. for example, if you have a font that substitutes f f i (u0066 u0066 u0069) by ffi (uFB03), what problems will arise? what will happen if you name the substituted glyph f_f_i and still assign the unicode point (uFB03)?

Miguel Sousa's picture

> where do these replacements actually occur? i’m curious. i’m still trying to understand the workings behind the scenes. for example, if you have a font that substitutes f f i (u0066 u0066 u0069) by ffi (uFB03), what problems will arise?

There are no actual replacements in e.g. InDesign. The underlying characters will remain the same when you apply the features; only the displayed glyphs will change. The issues may occur latter when, let's say, you generate a PDF file, send it to someone, and that person tries to retrieve the text from the PDF (or perform a search). If the PDF has a ToUnicode table, the text might come out with U+FB03, instead the original U+0066 U+0066 U+0069. If there's no ToUnicode table in the PDF, the glyph name will be parsed. According to this list the glyph name ffi will correspond to character U+FB03, and according to these rules the glyph name f_f_i will be mapped to the characters f, f and i. This page explains where and why glyph names are used.

> what will happen if you name the substituted glyph f_f_i and still assign the unicode point (uFB03)?

I think it will depend on how the glyph was inserted, which app generated the PDF and which app reads it and accesses the PDF contents. Ideally, the glyph f_f_i should be unencoded, and a glyph duplicate of it — named uniFB03 (or ffi) — should be the one to carry the encoding. But what you describe is not at all bad. That's actually what we do in our fonts. The reasoning being that we favor the ligature aspect — therefore the glyph name f_f_i and not ffi —, but at the same time we don't want people to get notdefs if the text happens to contain the character U+FB03 — therefore we assign the code point.

Crissov's picture

This thread is quite dated, but I’d like to comment on it nevertheless, because two important aspects are missing from all answers that opposed Nick Shinn’s idea (the second part of it that is).

1. Not all text can be controlled by the same person or entity who chooses the font.
This applies most prominently to the Web where the reader can ultimately choose the font and a multitude of stakeholders can suggest one (or actually more than one).

Site owners, their designers, editors and authors should enforce the correct character which will have the correct glyph as long as one of the typefaces used supports it, but browser vendors and users may use techniques to correct typographic nonos. This could be done by scripts, e.g. Greasemonkey, that replace characters, but why not let fonts replace glyphs instead?

2. Not all of the suggestion are the same.
There are several legacy characters in Unicode that fulfill multiple functions with a unified glyph. I strongly believe it’s fine for smart fonts to make these characters adapt their glyphs to the context when an appropriate one can be chosen without ambiguity.
This applies to U+0027 ', U+0022 " and U+002D - most prominently, but maybe , U+0060 ` and perhaps U+00B4 ´ or U+002A * as well.

@DIGIT = [zero one two three four five six seven eight nine]; 
# also .*num
feature salt {

lookup prime {
  # second or inch and minute or foot
  # context: following a digit without a space in between
  # U+0027 ' quotesingle ➞ U+2032/02B9 ʹ prime / primemod / minute
  # U+0022 " quotedbl ➞ U+2033/02BA ʺ doubleprime / doubleprimemod / second
  sub @DIGIT [quotesingle quotedbl]' by [prime doubleprime]; 
} prime;

lookup apostrophe {
  # apostrophe, also Hawaiian okina etc.
  # context: between letters
  # U+0027 ' quotesingle ➞ U+2019/02BC ’ quoteright / apostrophemod
  sub @LETTER quotesingle' @LETTER by quoteright; 
  # An apostrophe may appear at the start or end of a word,
  # but it then can not be distinguished from a single quote mark reliably,
  # because looking for a matching one can only be done for very simple cases in OT.
  # In English, both look the same at the right-hand side anyway
  # and one could add lexical rules for words where initial apostrophe is common, e.g. ’n’, ’tis.
} apostrophe;

lookup quotemark {
  # default, also English
  # context open: between a space or certain punctuation and a letter or certain punctuation.
  # U+0027 ' quotesingle ➞ U+2018 ‘ quoteleft
  # U+0022 " quotedbl ➞ U+201C “ quotedblleft
  # U+0060 ` grave ➞ U+2018 ‘ quoteleft
  sub @WORDBOUNDARY [quotesingle quotedbl grave]' [@LETTER @DIGIT @INITIALPUNCT] 
   by [quoteleft quotedblleft quoteleft];
  # context close: between a letter or certain punctuation and a space or certain punctuation
  # U+0027 ' quotesingle ➞ U+2019 ’ quoteright
  # U+0022 " quotedbl ➞ U+201D ʺ quotedblright
  # U+00B4 ´ acute ➞ U+2019 ’ quoteright
  sub [@LETTER @DIGIT @FINALPUNCT] [quotesingle quotedbl acute]' @WORDBOUNDARY 
   by [quoteright quotedblright quoteright];
  # to do: add localized variants
} quotemark;

lookup mathoperator {
  # U+002A * ➞ U+00D7 × multiply
  # U+002D - ➞ U+2212 − minus
  sub @DIGIT        [asterisk hyphen]' by [multiply minus];
  sub @DIGIT @SPACE [asterisk hyphen]' by [multiply minus];
  sub [asterisk hyphen]'        @DIGIT by [multiply minus];
  sub [asterisk hyphen]' @SPACE @DIGIT by [multiply minus];
} mathoperator;

lookup dash {
  # U+002D - ➞ U+2013 – endash or U+2014 — emdash
  ignore sub hyphen hyphen' hyphen' hyphen';
  sub hyphen' hyphen' hyphen' by emdash;
  sub @SPACE' hyphen' hyphen' @SPACE' by emdash;
  sub hyphen' hyphen' by endash;
  sub @SPACE hyphen' @SPACE by endash;
} dash;

@SIPREFIX = [Y Z E P T G M K k H h D d m micro mu n p f a z y];# without deka
lookup litre {
  # l ➞ U+2113 ℓ litre
  # context: after a digit, perhaps separated by a space, 
  #   perhaps preceded by an SI prefix (1 or 2 letters), not followed by a letter. 
  # only lowercase, because ‘L’ is already an unambiguous variant
  ignore sub l' @LETTER;
  sub @DIGIT                  l' by litre;
  sub @DIGIT @SPACE           l' by litre;
  sub @DIGIT        @SIPREFIX l' by litre;
  sub @DIGIT @SPACE @SIPREFIX l' by litre;
  sub @DIGIT        d [a k]   l' by litre;
  sub @DIGIT @SPACE d [a k]   l' by litre;
} litre;

} salt;

Unlike the asterisk, the glyph for the lowercase letter x must not be replaced by the one for times ×, because neither is the cross a normal glyph variant (like the script el) nor is the letter a generic/unified character.
At least the first of these reasons also applies to the suggested fist replacement U+003E > ➞ U+261E ☞.

I’m not sure, salt is the best feature for this, though.

filip blazek's picture

Crissov, I think you should avoid text correction on the font level. What may work in English may not work in other languages. There are also many exceptions where the font substitutions fail: while 1939–1945 is an interval with en-dash, 1939-1945 might be a product with a hyphen; if you substitute hyphen with en-dash, how would user correct it then? OpenType cannot solve such issues.

In my design studio, we've created an extensive and language sensitive list of find-change entries. We use it as a script for InDesign (based on the default FindChangeByList.jsx script) and we run it whenever we import a text in InDesign. It is 1000× smarter than OpenType features, it includes several very complex and contextual solutions.

I think it is pretty easy to develop such find-replace tool for texts on websites as well. There is one similar plugin for WordPress for instance wp-typography (http://kingdesk.com/projects/wp-typography/), maybe there are now newer ones available.

Syndicate content Syndicate content