[PATCH v3 5/9] winedbg: Escape special characters in GDB packet reply.
Rémi Bernon
rbernon at codeweavers.com
Wed Nov 17 02:05:27 CST 2021
On 11/17/21 08:36, Jinoh Kang wrote:
>>> +static void packet_reply_add_raw(struct gdb_context* gdbctx, const void* data, size_t len)
>>> +{
>>> + const unsigned char *ptr = data, *sptr, *end = ptr + len;
>>> + size_t seg_len;
>>> + int is_end;
>>> +
>>
>> I'm not sure what "seg_" is supposed to mean? How about just "len", and "tmp" instead of sptr?
>
> "seg" stands for segments (substrings), separated by special characters to be
> escaped. Ditto for "s" in "sptr".
>
> That said, I agree "tmp" would work just as fine. Since "len" is already taken,
> maybe something like "tmplen" would do?
>
>>
>>> + while (ptr != end)
>>> + {
>>> + for (sptr = ptr; sptr != end && !is_gdb_special_char(*sptr); sptr++)
>>> + ;
>>> +
>>> + seg_len = sptr - ptr;
>>> + is_end = sptr == end;
>>> +
>>> + packet_reply_grow(gdbctx, is_end ? seg_len : seg_len + 2);
>>> + memcpy(&gdbctx->out_buf[gdbctx->out_len], ptr, seg_len);
>>> + gdbctx->out_len += seg_len;
>>> + ptr += seg_len;
>>> +
>>> + if (is_end)
>>> + break;
>>> +
>>> + gdbctx->out_buf[gdbctx->out_len++] = 0x7D;
>>> + gdbctx->out_buf[gdbctx->out_len++] = 0x20 ^ *ptr++;
>>> + }
>>> +}
>>> +
>>
>> What about something like:
>
> Thanks a lot for taking time to improve it. A few comments, though:
>
>>
>> const unsigned char *begin = data, *end = begin + len, *ptr = begin;
>
> I'm not sure the name "begin" is clear enough, due to the asymmetry of semantics
> of "begin" and "end".
>
> - "begin" denotes the start of the subsequence we are currently working on.
> - "end" denotes the end of the entire input data.
>
> Instead of "begin", how about "segment", "part", or "curr"? Alternatives
> include "fragment" and "curr_part", but they don't sound as attractive.
>
Sure, "tmp" or "curr" should be generic enough to not worry about the
semantics anymore.
>>
>> while (ptr != end)
>> {
>> while (ptr != end && !is_gdb_special_char(*ptr)) ptr++;
>>
>> packet_reply_grow(gdbctx, ptr - begin + 2);
>
> I see that the extra "is_end" variable doesn't help with readability.
>
> If we're rewriting for clarity, how about calling "packet_reply_grow" twice
> instead of unconditionally growing the buffer for the extra two bytes of escape
> sequence? The reasons are:
>
> 1. Since it always allocates two more bytes at the end of the data,
> "packet_reply_add_raw" ends up eating extra 2B of space for each call (2*n).
>
> 2. It should be more often the case of "ptr == end" than "is_gdb_special_char()".
> Special characters are rare, and it's better to optimise for the more likely
> path.
>
Works for me too, although I don't think the extra bytes would change
anything in general as the allocated areas are probably rounded anyway.
>> memcpy(&gdbctx->out_buf[gdbctx->out_len], begin, ptr - begin);
>> gdbctx->out_len += ptr - begin;
>> if (ptr == end) break;
>>
>> gdbctx->out_buf[gdbctx->out_len++] = 0x7D;
>> gdbctx->out_buf[gdbctx->out_len++] = 0x20 ^ *ptr++;
>> begin = ptr;
>
> I suppose this assignment could be moved to the beginning of the loop body...
>
>> }
>>
>> Could do with a len variable too if you feel like it's better.
>
> ...unless we're using tmp_len instead of "ptr - begin", of course.
>
>>
>>> static inline void packet_reply_add(struct gdb_context* gdbctx, const char* str)
>>> {
>>> - int len = strlen(str);
>>> - packet_reply_grow(gdbctx, len);
>>> - memcpy(&gdbctx->out_buf[gdbctx->out_len], str, len);
>>> - gdbctx->out_len += len;
>>> + packet_reply_add_raw(gdbctx, str, strlen(str));
>>> }
>>> static void packet_reply_open(struct gdb_context* gdbctx)
>>> {
>>> assert(gdbctx->out_curr_packet == -1);
>>> - packet_reply_add(gdbctx, "$");
>>> + packet_reply_grow(gdbctx, 1);
>>> + gdbctx->out_buf[gdbctx->out_len++] = '$';
>>> gdbctx->out_curr_packet = gdbctx->out_len;
>>> }
>>> @@ -773,7 +826,8 @@ static void packet_reply_close(struct gdb_context* gdbctx)
>>> int plen;
>>> plen = gdbctx->out_len - gdbctx->out_curr_packet;
>>> - packet_reply_add(gdbctx, "#");
>>> + packet_reply_grow(gdbctx, 1);
>>> + gdbctx->out_buf[gdbctx->out_len++] = '#';
>>> cksum = checksum(&gdbctx->out_buf[gdbctx->out_curr_packet], plen);
>>> packet_reply_hex_to(gdbctx, &cksum, 1);
>>> gdbctx->out_curr_packet = -1;
>>> @@ -812,8 +866,6 @@ static enum packet_return packet_reply(struct gdb_context* gdbctx, const char* p
>>> {
>>> packet_reply_open(gdbctx);
>>> - assert(strchr(packet, '$') == NULL && strchr(packet, '#') == NULL);
>>> -
>>> packet_reply_add(gdbctx, packet);
>>> packet_reply_close(gdbctx);
>>>
>>
>> After this change packet_reply_add helper seems a little useless, I suggest maybe:>
>> * Keep packet_reply_add like it is, and use it everywhere some static strings are added, including in open/close here, where we know we don't need escaping.
> My intention was to repurpose packet_reply_add to be used *only* for emitting
> payload data in GDB packets, and not sending arbitrary data to the socket.
> Otherwise, someone else might end up using the wrong function and run into
> hard-to-debug protocol errors. By repurposing it, the code clearly draws
> the line between the (high-level) GDB protocol and the underlying raw serial
> transport.
>
> Also, the overhead of scanning short string literals should not be significant.
>
Yeah I wasn't really worried about that, more about the mostly empty
helper that's only calling the _raw version.
Maybe we can have just a single "packet_reply_add" helper which always
escape the data, and not call it for the '$' and '#' cases?
--
Rémi Bernon <rbernon at codeweavers.com>
More information about the wine-devel
mailing list