[PATCH] jscript: Ignore BOM mark in next_token. (try 5)

Nikolay Sivov bunglehead at gmail.com
Thu Oct 2 06:06:18 CDT 2014


On 10/2/2014 14:48, Jacek Caban wrote:
> On 10/02/14 08:29, Nikolay Sivov wrote:
>>>    static BOOL skip_spaces(parser_ctx_t *ctx)
>>>    {
>>> -    while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {
>>> +    while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr
>>> == 0xFEFF /* UTF16 BOM */)) {
>>>            if(is_endline(*ctx->ptr++))
>>>                ctx->nl = TRUE;
>>>        }
>> This looks correct according to ECMA-252 section 7.2 - all of the
>> following is a whitespace:
>>
>> - tab and vertical tab, 0x9 and 0xb;
>> - form feed 0xc
>> - space 0x20
>> - NBSP 0xa0
>> - UTF-16 BOM 0xfeff
>> - any other Unicode "space separator"
>>
>> Hopefully isspaceW() covers everything but the BOM. What worries me is
>> that isspaceW() itself is used in numerous places in code on its own.
>> So probably we need more tests to cover more cases where space
>> separators could be used, and later have our own is_space() call that
>> will conform to the standard.
> FWIW, ECMA-262 (which I usually use for jscript development) doesn't
> mention UTF-16 as white space.
Sorry, 252 was a typo of course. It does mention it here:
http://www.ecma-international.org/ecma-262/5.1/#sec-7.2
>   Anyway, I agree that it would be
> interesting to see if it's considered white space in other places as
> well. (I'm also fine with the patch in current form, but an extended
> version would be obviously better).
Sure, I'm not saying it's wrong either, just pointing out a potential 
direction for improvement.
>
> Jacek
>
>




More information about the wine-devel mailing list