Rewriting text output

Rendering text in a graphical environment is … let’s call it ‘involved’. Assuming you want to output a single line of text in a single typeface and colour, that’s pretty easy. The interface libraries will even probably be willing to word-wrap it for you.

If you want to do other things, like mix plain text with italics or bold or text in other colours, well suddenly you’ve got dozens of hoops to jump through.

Well, I did want that, so I had to work for it.

I’d already written a renderer that managed to provide both plain and italic text output, but it was annoyingly crude. What I really needed was to be able to properly mark up source text (both explicitly and implicitly) and have that pass all the way through to the text renderer, and have the text renderer handle that gracefully.

Sure, modern GUI systems have modern widgets/controls that allow you to provide marked-up content and have it come out right. HTML widgets, Rich-Text widgets, and the like.

But they’re about as portable as Mount Everest, for the most part.

What I needed was a text-renderer that could output a single line of text, or multiple lines of text, with a plain typeface, or italic typeface, with variable colours, in a variably-sized rectangle…. and do it all using only portable or commonly-supported drawing operations, and wasn’t insanely heavyweight.

That meant I needed my own intermediate markup representation, and I needed to cache a lot of text metrics.

So, a vanilla string class wasn’t going to do the job. There are all sorts of ways I could have leveraged the basic_string template to do a lot of the work, but ultimately the simplest way ended up being to create a new string class just to carry the data through its life-cycle.

class FormattableString final
{

private:
 std::wstring _text;
 std::vector<int> _attributes;
 std::vector<CharacterExtent> _extents;
public:
 enum {ATTRIB_PLAIN=0,ATTRIB_EMPH=1,ATTRIB_FOOTNOTE=2};
...

_text contains all the characters, _attributes handles the attribute-bits for each character, and _extents contains the actual layout dimensions of the character on-screen, because in the end, we’ll be rendering the string in small chunks.

There are only the three attributes initially defined. That’s all I need today. Tomorrow is another story, but there’s plenty of room to add them in.

The extent data for each character isn’t initially set. The vector is constructed to the right length, but the CharacterExtent instances are default-constructed (all zeroes). That’s not really data we need right away. First we have to get the string out of the platform-independent engine-library and out into the platform-dependent application layer.

Once we actually get it there, and we’re committed to drawing the text, then we’ll actually find the text-metrics for each character, based on currently selected fonts, and fill that data in.

From there, word-wrapping actually becomes a fairly simple matter. That gives me a vector of FormattableStrings ready to be drawn on the screen one at a time.

While I could clump the drawing operations more efficiently, handling actual output one character at a time involves very little work. There’s room to optimise later, but the result was nowhere near as slow as I’d expected. In fact, it is quite zippy. But then, I’m not working with gigantic pages of text, either.

Good things to know

Unicode has reserved characters. More specifically, it has values that will never have characters or meanings assigned to them. You can go nuts with the range \ue000 to \uf8ff.

That is excellent! It means that you can freely use unicode values in those ranges for your own internal purposes. They make excellent delimiters for separating unicode strings (a boon when you are serialising and deserialising data), and they’re great for inserting arbitrary markup into strings. Code your human-readable markup into pairs of reserved unicode values, hand that along, and eventually process and strip them out.

Since they only take a single character (well, a single wchar_t), they’re easy and fast to insert or remove or replace.

By the time I looked up from the major rewrite of the text-renderer, about five hours had gone by. However, the rewritten code did everything that I’d originally planned for it (plus about 50%). As a bonus, it is almost trivially easy to add some more rendering features to it.

I certainly can’t complain about the time lost in the coding vortex.