[PATCH v3 5/9] winedbg: Escape special characters in GDB packet reply.

Wed Nov 17 02:05:27 CST 2021

On 11/17/21 08:36, Jinoh Kang wrote:
>>> +static void packet_reply_add_raw(struct gdb_context* gdbctx, const void* data, size_t len)
>>> +{
>>> +    const unsigned char *ptr = data, *sptr, *end = ptr + len;
>>> +    size_t seg_len;
>>> +    int is_end;
>>> +
>>
>> I'm not sure what "seg_" is supposed to mean? How about just "len", and "tmp" instead of sptr?
> 
> "seg" stands for segments (substrings), separated by special characters to be
> escaped.  Ditto for "s" in "sptr".
> 
> That said, I agree "tmp" would work just as fine.  Since "len" is already taken,
> maybe something like "tmplen" would do?
> 
>>
>>> +    while (ptr != end)
>>> +    {
>>> +        for (sptr = ptr; sptr != end && !is_gdb_special_char(*sptr); sptr++)
>>> +            ;
>>> +
>>> +        seg_len = sptr - ptr;
>>> +        is_end = sptr == end;
>>> +
>>> +        packet_reply_grow(gdbctx, is_end ? seg_len : seg_len + 2);
>>> +        memcpy(&gdbctx->out_buf[gdbctx->out_len], ptr, seg_len);
>>> +        gdbctx->out_len += seg_len;
>>> +        ptr += seg_len;
>>> +
>>> +        if (is_end)
>>> +            break;
>>> +
>>> +        gdbctx->out_buf[gdbctx->out_len++] = 0x7D;
>>> +        gdbctx->out_buf[gdbctx->out_len++] = 0x20 ^ *ptr++;
>>> +    }
>>> +}
>>> +
>>
>> What about something like:
> 
> Thanks a lot for taking time to improve it.  A few comments, though:
> 
>>
>>      const unsigned char *begin = data, *end = begin + len, *ptr = begin;
> 
> I'm not sure the name "begin" is clear enough, due to the asymmetry of semantics
> of "begin" and "end".
> 
> - "begin" denotes the start of the subsequence we are currently working on.
> - "end" denotes the end of the entire input data.
> 
> Instead of "begin", how about "segment", "part", or "curr"?  Alternatives
> include "fragment" and "curr_part", but they don't sound as attractive.
> 

Sure, "tmp" or "curr" should be generic enough to not worry about the 
semantics anymore.

>>
>>      while (ptr != end)
>>      {
>>          while (ptr != end && !is_gdb_special_char(*ptr)) ptr++;
>>
>>          packet_reply_grow(gdbctx, ptr - begin + 2);
> 
> I see that the extra "is_end" variable doesn't help with readability.
> 
> If we're rewriting for clarity, how about calling "packet_reply_grow" twice
> instead of unconditionally growing the buffer for the extra two bytes of escape
> sequence?  The reasons are:
> 
> 1. Since it always allocates two more bytes at the end of the data,
>     "packet_reply_add_raw" ends up eating extra 2B of space for each call (2*n).
> 
> 2. It should be more often the case of "ptr == end" than "is_gdb_special_char()".
>     Special characters are rare, and it's better to optimise for the more likely
>     path.
> 

Works for me too, although I don't think the extra bytes would change 
anything in general as the allocated areas are probably rounded anyway.

>>          memcpy(&gdbctx->out_buf[gdbctx->out_len], begin, ptr - begin);
>>          gdbctx->out_len += ptr - begin;
>>          if (ptr == end) break;
>>
>>          gdbctx->out_buf[gdbctx->out_len++] = 0x7D;
>>          gdbctx->out_buf[gdbctx->out_len++] = 0x20 ^ *ptr++;
>>          begin = ptr;
> 
> I suppose this assignment could be moved to the beginning of the loop body...
> 
>>      }
>>
>> Could do with a len variable too if you feel like it's better.
> 
> ...unless we're using tmp_len instead of "ptr - begin", of course.
> 
>>
>>>    static inline void packet_reply_add(struct gdb_context* gdbctx, const char* str)
>>>    {
>>> -    int len = strlen(str);
>>> -    packet_reply_grow(gdbctx, len);
>>> -    memcpy(&gdbctx->out_buf[gdbctx->out_len], str, len);
>>> -    gdbctx->out_len += len;
>>> +    packet_reply_add_raw(gdbctx, str, strlen(str));
>>>    }
>>>      static void packet_reply_open(struct gdb_context* gdbctx)
>>>    {
>>>        assert(gdbctx->out_curr_packet == -1);
>>> -    packet_reply_add(gdbctx, "$");
>>> +    packet_reply_grow(gdbctx, 1);
>>> +    gdbctx->out_buf[gdbctx->out_len++] = '$';
>>>        gdbctx->out_curr_packet = gdbctx->out_len;
>>>    }
>>>    @@ -773,7 +826,8 @@ static void packet_reply_close(struct gdb_context* gdbctx)
>>>        int plen;
>>>          plen = gdbctx->out_len - gdbctx->out_curr_packet;
>>> -    packet_reply_add(gdbctx, "#");
>>> +    packet_reply_grow(gdbctx, 1);
>>> +    gdbctx->out_buf[gdbctx->out_len++] = '#';
>>>        cksum = checksum(&gdbctx->out_buf[gdbctx->out_curr_packet], plen);
>>>        packet_reply_hex_to(gdbctx, &cksum, 1);
>>>        gdbctx->out_curr_packet = -1;
>>> @@ -812,8 +866,6 @@ static enum packet_return packet_reply(struct gdb_context* gdbctx, const char* p
>>>    {
>>>        packet_reply_open(gdbctx);
>>>    -    assert(strchr(packet, '$') == NULL && strchr(packet, '#') == NULL);
>>> -
>>>        packet_reply_add(gdbctx, packet);
>>>          packet_reply_close(gdbctx);
>>>
>>
>> After this change packet_reply_add helper seems a little useless, I suggest maybe:>
>> * Keep packet_reply_add like it is, and use it everywhere some static strings are added, including in open/close here, where we know we don't need escaping.
> My intention was to repurpose packet_reply_add to be used *only* for emitting
> payload data in GDB packets, and not sending arbitrary data to the socket.
> Otherwise, someone else might end up using the wrong function and run into
> hard-to-debug protocol errors.  By repurposing it, the code clearly draws
> the line between the (high-level) GDB protocol and the underlying raw serial
> transport.
> 
> Also, the overhead of scanning short string literals should not be significant.
> 

Yeah I wasn't really worried about that, more about the mostly empty 
helper that's only calling the _raw version.

Maybe we can have just a single "packet_reply_add" helper which always 
escape the data, and not call it for the '$' and '#' cases?
-- 
Rémi Bernon <rbernon at codeweavers.com>