[PATCH] jscript: Ignore BOM mark in next_token. (try 5)

Thu Oct 2 01:29:42 CDT 2014

>   static BOOL skip_spaces(parser_ctx_t *ctx)
>   {
> -    while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {
> +    while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr == 0xFEFF /* UTF16 BOM */)) {
>           if(is_endline(*ctx->ptr++))
>               ctx->nl = TRUE;
>       }
This looks correct according to ECMA-252 section 7.2 - all of the 
following is a whitespace:

- tab and vertical tab, 0x9 and 0xb;
- form feed 0xc
- space 0x20
- NBSP 0xa0
- UTF-16 BOM 0xfeff
- any other Unicode "space separator"

Hopefully isspaceW() covers everything but the BOM. What worries me is 
that isspaceW() itself is used in numerous places in code on its own. So 
probably we need more tests to cover more cases where space separators 
could be used, and later have our own is_space() call that will conform 
to the standard.