[PATCH v3 1/2] kernelbase/locale: Implement comparison on top of official unicode weight tables
Fabian Maurer
dark.shadow4 at web.de
Wed Mar 4 15:18:00 CST 2020
Hello Alexandre,
> Multi-language support, Japanese, Korean, multi-char sequences,
> surrogates, linguistic mappings, etc.
>
> There are a million things that need to be supported for proper
> sorting. You don't have to implement them all, but it should be clear
> from your approach that they can be added. Which in practice means you
> need to at least prototype most of them.
Well, they can be added, it's just that I left them out for the initial
versions...
Short breakdown:
- Multi-language: The character is looked up the current language, as a
fallback the default is used. Currently, only the default is implemented
- Japanese: Main reason why I did all of this. Special case, but supported by
the tables.
- Korean: Handled under Jamo. Special case, but supported by the tables.
Currently not properly implemented by me because it's a lot of work
- Multi-char sequences: You man when a single codepoint is encoded as more
than one WCHAR? Is supported, windows seems to treat each WCHAR separately
- Surrogates: Windows seems to treat each WCHAR on their own
- Linguistic mappings: Not sure what you mean, sorry
Question: How should I prove it works? I can't possible add all of that in the
first draft.
> For instance you do 10 memory allocations before even starting to
> compare anything. That's clearly not cheap.
I understand. But for a dynamic sized sortkey I need to have dynamic buffers.
Maybe I could put the initial buffers on the stack?
> We only have tests for a very small number of strings, that's clearly
> not proper coverage. Some way of systematically generating test strings
> should be considered.
Like, random strings from a known seed? I intentionally didn't do that,
because of performance concerns.
> Also testing sort keys directly, like you did in
> the first try (but without depending on the exact values).
I've that planned, yes. Do you want that in the first version already?
> When there are differences between Windows versions we want to use the
> latest, since that's the one that will continue to work in the
> future. In this case it means using the most recent table.
Okay then. If that's important, I can change the table.
> Note that we most likely want to use a Windows-compatible NLS file, like
> we are now using for codepage or normalization tables. I can work on
> that part.
I have to admit, I don't know what you mean by that. I don't know about NLS
files.
Regards,
Fabian Maurer
More information about the wine-devel
mailing list