[PATCH 1/2] regedit: Overhaul Unicode line processing

Tue Nov 22 07:00:16 CST 2016

On Tuesday, 22 November 2016 11:37 PM, Alexandre Julliard wrote:
>Hugh McMaster writes:
>
>> On Tuesday, 22 November 2016 6:38 AM, Alexandre Julliard wrote:
>>> Hugh McMaster writes:
>>> 
>>> > +        p = strpbrkW(line, line_endings);
>>> > +        if (!p)
>>> > +        {
>>> > +            HeapFree(GetProcessHeap(), 0, buf);
>>> > +            return NULL;
>>> 
>>> If there are no line endings you need to read more data instead of
>>> failing.
>>
>> By the time p == NULL, we are already at the end of the buffer.
>> And even if we weren't, the buffer would be increased and more data
>> read in, on the second (and subsequent) loops.
>
>There won't be a subsequent loop since you returned failure.

This is a marginal case at best. The first line would need to be use 4095 bytes
and not have a newline in it at all for this to fail. Still, I take your point in
handling this aspect correctly.

>> This does assume, of course, that the .reg file format is valid. I'm not
>> sure if .reg files are valid if they end without a newline sequence.
>> I'll look into this and write some tests to check.

>The function should of course be able to handle a file that doesn't end
>in a newline, but the problem above would happen on any line that does
>not fit into the existing buffer.

Windows allows the final line in a .reg file to end without a newline on
all versions except XP.

Our current implementation is broken in this regard.

>> The patch splits the buffer contents into lines. But we do end up
>> reading the entire file into the buffer because 
>> REG_VAL_BUF_SIZE is set to 4096 bytes. We could decrease
>> this to 1024 bytes if you want. But, in many cases, I expect
>> the same behaviour to occur.

>The initial size won't make a difference, the problem is that you always
>append data without removing it, so the buffer grows to the entire file
>size.  It shouldn't need to grow bigger than the longest line.

Continually moving the remaining characters to the start of the buffer
hits perfomance. That's why I took the 'read into memory'
approach as line parsing is faster in RAM.

But fair enough, I'll rework the patch. Still, I'm interested to know why
you dislike reading the file into memory first.

Thanks for the review.