[PATCH v3 1/2] kernelbase/locale: Implement comparison on top of official unicode weight tables

Wed Mar 4 16:03:45 CST 2020

Hello Alexandre,

> I don't see any language support, there's just one big sortkey
> table. Yes, that's what the current code is doing too, but if we are
> rewriting it, we should get the architecture right.

Yeah, there's no language support yet. I just noted how it's done, but for the
first patch it's not implemented yet.

> I mean when multiple chars map to one sortkey. The COMPRESSION sections
> in the Microsoft table.

Well, I didn't implemented that yet, but it can be done.

> > - Linguistic mappings: Not sure what you mean, sorry
>
> NORM_LINGUISTIC_CASING and the like.

I see, same answer then.

> > Question: How should I prove it works? I can't possible add all of that in
> > the first draft.
>
> The usual way is to add a bunch of tests with todo_wine, and then send a
> patch series with each patch removing the corresponding todos.

I know, but for this patchset that doesn't prove that it can be done. It would
only prove that once I submit the patch for that, no? Or do you want me to
submit all at once?

> >> We only have tests for a very small number of strings, that's clearly
> >> not proper coverage. Some way of systematically generating test strings
> >> should be considered.
> >
> > Like, random strings from a known seed? I intentionally didn't do that,
> > because of performance concerns.
>
> Not necessarily random, but some interesting data. For instance the
> normalization tests can run the entire test suite from unicode.org, you
> may be able to find something similar. Or build your own somehow.

Well, I added what I consider to be interesting. A few testcases for the bits
of code I implemented, to have as complete coverage as possible. Not sure what
you'd consider interesting data, or where I'd find it. According to the
algorithm I implemented, I already cover the corner cases. What more is
needed? I could certainly add a bunch of random strings though.

> >> Also testing sort keys directly, like you did in
> >> the first try (but without depending on the exact values).
> >
> > I've that planned, yes. Do you want that in the first version already?
>
> The tests should come before the code, or at the same time.

That's how I planned it - but not in the first version. As I said, I planned
to split everything into manageable sizes. I could add all possible tests in
one huge patch, but there's no real benefit to that.

In short, I think the main problem here is that I want to split the
implementation into multiple patches. I planned to add functionality one-by-
one, covered with as many tests as I needed to give the code near full
coverage. Is that a bad approach? If you want proof that the functionality can
be added, please tell me how.

Regards,
Fabian Maurer