richedit: (0/9) Patches to fix proper CR/LF encoding for 1.0 emulation

Mon Apr 28 11:44:55 CDT 2008

The following series of patches aim to implement a permanent fix to the 
issue
highlighted by the todo_wine tests for riched32, of proper preservation and
retrieval of arbitrary sequences of \r and \n characters, and interpretation
of them as proper line breaks. The tests in question are contained in
dlls/riched32/tests/editor.c, in the function test_WM_SETTEXT().

To recap:
Richedit 2.0 and higher standarize the rule for line breaks: \r\r\n is a 
space,
either of \r, \n, or \r\n is a line break. For text retrieval, a line 
break is
converted back to \r unless a CRLF flag is set for the message, or 
WM_GETTEXT
is used, in which a line break is returned as \r\n.
Richedit 1.0 considers \r{0,N}\n or \r NOT followed by a \n, to be line 
breaks.
The important issue is that richedit 1.0 remembers the arbitrary 
sequence that
defined the line break, and returns it verbatim on any operation that 
retrieves
text. Also, the line-break sequence of arbitrary length is honored when
calculating character offsets. The implementation of richedit supplied 
by wine
deviates from this by considering all line breaks as \r\n with offset 2,
regardless of the original character sequence that was supplied. One bug 
that
results from this deviation is bug #5968, in which an application 
inserts \r\r\n
in 1.0 emulation and expects a delta of 3, but finds one of 4 (now 2) 
instead.

The strategy:
These patches first add a pair of new fields to the ME_Run structure. 
These two
fields contain the number of \r characters and the number of \n characters.
These values are only set when setting the MERF_ENDPARA flag, and are 
undefined
otherwise. For richedit 2.0, the values are always set to (1,0) - one \r
character, and no \n character. This replicates current behavior. For normal
sequences in richedit 1.0 emulation, this would be set to (1,1) - one \r
character and one \n character. All of this is implemented in
ME_InsertTextFromCursor() with filling of the MERF_ENDPARA runs that 
replicates
current behavior. This also requires storing number of characters in 
undo and
redo, so they can be correctly replayed.

For retrieving text in 2.0 mode, line breaks are considered as one \r 
character,
as before. For 1.0 mode, the actual values stored are used to 
reconstruct the
character sequence.

Then, the patches convert as much of the offset calculation with 
paragraph ends
to rely on the values stored in the MERF_ENDPARA run, insted of relying 
on the
bEmulateVersion10 flag. This allows most of the code to stop caring 
about 1.0
emulation being active or not when calculating offsets, which should 
reduce the
number of special cases. At no point up to here are tests supposed to 
break or
change their todo_wine status.

Finally, ME_InsertTextFromCursor() is again modified to actually honor the
line break sequences and encode them in the MERF_ENDPARA run, using the
run-splitting functionality if necessary. Alongside this, 
ME_DeleteTextAtCursor()
is altered to make use of the offsets and the run-joining to reconstruct 
line
breaks properly. At this point, most, if not all, of the riched32 tests 
should
now succeed.

-- 
perl -e '$x=2.4;print sprintf("%.0f + %.0f = %.0f\n",$x,$x,$x+$x);'