[PATCH 1/5] hhctrl.ocx: Add HTML to Unicode parsing capability.

Sun Jun 10 12:19:34 CDT 2012

On 6/8/12 11:18 PM, Erich E. Hoover wrote:
> On Fri, Jun 8, 2012 at 8:17 AM, Jacek Caban<jacek at codeweavers.com>  wrote:
>> ...
>> I don't know any helper API for that. Writing decoder for HTML-encoded
>> characters sounds like a good solution.
> How does something like the attached sound?
>

A few comments:

You definitely don't need a new header file for just one funcition 
declaration. Even the implementation probably doesn't need a separated 
file (it's <200 lines of code that is unlikely to grow).

+#include "hhctrl.h"
+#include <mshtml.h>

Probably left from the previous patch?

+        spc = strchr(amp, ' ');
+        if(spc && spc < sem)
+            break; /* cannot have a space between the ampersand and the 
semicolon */

This should not be needed (see above).

+        /* Convert the characters prior to the HTML encoded character */
+        wlen = MultiByteToWideChar(CP_ACP, 0, h, len, NULL, 0);
+        MultiByteToWideChar(CP_ACP, 0, h, len, w, wlen);

One call should be enough. You may just pass remaining space in the 
output buffer as its length.

+        if(amp[0] != '#')
+        {
+            
for(i=0;i<sizeof(html_encoded_symbols)/sizeof(html_encoded_symbols[0]);i++)
+            {
+                const char *encoded_symbol = 
html_encoded_symbols[i].html_code;
+
+                if(strncmp(encoded_symbol, amp, len) == 0)
+                {
+                    symbol = html_encoded_symbols[i].ascii_symbol;
+                    break;
+                }
+            }
+        }

Binary search sounds like a good choice here (although just FIXME 
comment would be fine for the patch).

+        {
+            int tmp;
+
+            sscanf(amp, "%d", &tmp);
+            symbol = tmp;
+        }

This will decode "&#123xxx;" as 123 instead of an invalid char. If you 
get it right, the earlier check for space won't be needed. strtol is 
probably better tool for this.

+            wlen = MultiByteToWideChar(CP_ACP, 0, &symbol, 1, NULL, 0);
+            MultiByteToWideChar(CP_ACP, 0, &symbol, 1, w, wlen);

Same here, two calls are not needed.

Cheers,
Jacek