[PATCH 1/5] hhctrl.ocx: Add HTML to Unicode parsing capability.
Jacek Caban
jacek at codeweavers.com
Sun Jun 10 12:19:34 CDT 2012
On 6/8/12 11:18 PM, Erich E. Hoover wrote:
> On Fri, Jun 8, 2012 at 8:17 AM, Jacek Caban<jacek at codeweavers.com> wrote:
>> ...
>> I don't know any helper API for that. Writing decoder for HTML-encoded
>> characters sounds like a good solution.
> How does something like the attached sound?
>
A few comments:
You definitely don't need a new header file for just one funcition
declaration. Even the implementation probably doesn't need a separated
file (it's <200 lines of code that is unlikely to grow).
+#include "hhctrl.h"
+#include <mshtml.h>
Probably left from the previous patch?
+ spc = strchr(amp, ' ');
+ if(spc && spc < sem)
+ break; /* cannot have a space between the ampersand and the
semicolon */
This should not be needed (see above).
+ /* Convert the characters prior to the HTML encoded character */
+ wlen = MultiByteToWideChar(CP_ACP, 0, h, len, NULL, 0);
+ MultiByteToWideChar(CP_ACP, 0, h, len, w, wlen);
One call should be enough. You may just pass remaining space in the
output buffer as its length.
+ if(amp[0] != '#')
+ {
+
for(i=0;i<sizeof(html_encoded_symbols)/sizeof(html_encoded_symbols[0]);i++)
+ {
+ const char *encoded_symbol =
html_encoded_symbols[i].html_code;
+
+ if(strncmp(encoded_symbol, amp, len) == 0)
+ {
+ symbol = html_encoded_symbols[i].ascii_symbol;
+ break;
+ }
+ }
+ }
Binary search sounds like a good choice here (although just FIXME
comment would be fine for the patch).
+ {
+ int tmp;
+
+ sscanf(amp, "%d", &tmp);
+ symbol = tmp;
+ }
This will decode "{xxx;" as 123 instead of an invalid char. If you
get it right, the earlier check for space won't be needed. strtol is
probably better tool for this.
+ wlen = MultiByteToWideChar(CP_ACP, 0, &symbol, 1, NULL, 0);
+ MultiByteToWideChar(CP_ACP, 0, &symbol, 1, w, wlen);
Same here, two calls are not needed.
Cheers,
Jacek
More information about the wine-devel
mailing list