[PATCH 1/5] kernelbase/locale: Implement sortkey generation on official tables

Thu Apr 30 12:50:37 CDT 2020

Hello Zebediah,

thanks for your reply!

> So as far as I understand, the sort key algorithm writes the level 0
> weights (script and alphabetic weight) for the whole string to the sort
> key, then the level 1 weights (diacritic), and so on, right?

Yes.

> In that case, what seems potentially simpler to me is to calculate those
> weights one level at a time, rather than one character at a time.

But that would mean that we would iterate over the string multiple times, and
in the function have a branch as to which weight to write.
To me, my current approach seemed easier. I've thought quite a while about
abandoning my lists, but the results ended up being a lot more complex.

> That is, you'd end up doing something like
>
> static int get_sortkey( DWORD flags, const WCHAR *src, int srclen, char
> *dst, int dstlen )
> {
>      int used = 0;
>      for (i = 0; i < srclen; ++i)
>      {
>          used += get_main_weights(src[i], dst + used, dstlen - used);
>          if (!(flags & NORM_IGNORENONSPACE))
>              used += get_diacritic_weights(src[i], dst + used, dstlen -
> used);
>          ...
>      }
> }

This won't work, since we first need all main weights, then the others.
Also keep in mind, that we need to have some temporary buffer anyways, since
weights that are very small only get added when there is a bigger one
following it. Either that, or we need to backtrack and remove weights again.
Due to those issues, I opted for the current approach.

> As that example shows, I also think it's probably simpler to just pass
> the buffer directly to whatever functions are writing sortkey bytes into it.

As noted, it's not that easy.
Especially when you need to deal with buffers that are too short - we'd need
to mix in a stop condition but still return the needed length. Sometimes, we
just need to continue iterating despite knowing we can't copy it into the
sortkey.

> Is there a reason to use LCMapStringEx() here rather than LCMapString()?

LCMapStringA/W completely ignore the locale, and use a LCID. We'd need to fix
those functions to properly convert the locale. To start simple, I decided to
use the more low-level function and once that works, built the rest on top.

> The fact that this test is commented out never struck me as great. I'm
> pretty sure that with todo_wine added as appropriate, it could pass. A
> first patch in this series could be to do that.

Probably. I decided to leave it alone for now, and comment it in once it
works, but sure, I can change that.

> Are these comments useful?

Eh, I used them for a quick visual separation. I can remove them if they're
too much noise though.

> I get the impression that typedefs have largely fallen out of favour.

I always use them when I can, since they make the code shorter and I don't
think having to add the struct keyword adds much. I've seen both in wine code,
but given a policy I'll stick to that.

> The fact that exactly one of these integer cases has a symbolic constant
> attached seems less than ideal.

True, I'll add constants for the others as well.

Regards,
Fabian Maurer