[PATCH v2 2/4] vbscript: Move the script context heap to the script dispatch object.

Thu Oct 24 15:22:47 CDT 2019

On 10/24/19 3:32 PM, Gabriel Ivăncescu wrote:
> Hi Jacek,
>
> On 10/23/19 9:16 PM, Jacek Caban wrote:
>> Hi Gabriel,
>>
>> On 10/21/19 1:40 PM, Gabriel Ivăncescu wrote:
>>> The heap stores dynamic global variables added during the execution
>>> phase. We'll need to reference them in the TypeInfo, and so we have to
>>> reference count the heap itself, and the simplest way is to use the 
>>> script
>>> dispatch object to store it.
>>
>>
>> It seems to me that you concentrated a bit too much at avoiding 
>> copying the data rather than on root of the problem. It's possible to 
>> avoid copying in different ways. The thing is that having to 
>> represent the collection of global members in different ways for 
>> similar tasks probably means that representation of the collection is 
>> not right. Actually, we know that current global scope representation 
>> is not great. A simple list was nice to have things done and working, 
>> but it's surely not optimal. I don't expect you to rewrite it all, 
>> but it will be eventually rewritten and it would be good to know that 
>> we're changing things in the right direction instead of working on 
>> something that will need to be rewritten for future improvements.
>>
>
> Now that I've looked deeper into the vbscript implementation, I 
> actually think the current one is wrong and don't quite understand how 
> it even works, honestly.
>
> It looks to me as the compilation phase that creates the vbscode_t 
> that is then appened to the ctx->code_list adds functions and 
> variables, and they are immediately linked to the context's 
> global_funcs/vars respectively. This happens even *if* the code in 
> question has "pending_exec" so it's not even executed immediately. 
> Shouldn't those funcs/vars only be added to the global context list 
> when they are being executed?

Maybe, I'm not sure. It's not obviously wrong. Testing that would be nice.

>
> Furthermore, the funcs/vars during compilation are added to the 
> vbscode_t's heap and stored there, while vars added during execution 
> are added to the context's heap.
>
> release_script will go through all global_vars to release their 
> variant data, but in destroy_script, the code_list is released (and 
> all vbscode_t heaps) *before* the script is released. Doesn't this 
> mean some variables in the linked list will have already been freed 
> (if they were part of some vbscode_t) and thus we're accessing freed 
> memory when releasing the vars?
>
> Overall I'm just slightly confused as to how it even works right now, 
> so I've no idea how to improve it.

Well, it definitely could use a clean up. Note that before 
destroy_script() we always set the state to closed in the destructor, so 
in practice the release_script() is always called before 
destroy_script() and there is no problem. We should eventually get rid 
of destroy_script(). If we want script_ctx_t ref-counted, it could be a 
part of release_script() and we could have separated detach_script() 
that would be used for closing script engine.

>
> Here's my thought process on how to proceed, but I could be completely 
> wrong.
>
> First, store the global_vars/funcs in the script dispatch object, 
> instead of the context (just like jscript does). Furthermore, when a 
> code in the code_list is being executed, move it to the script 
> dispatch object (for ref counting) and remove it from the context's 
> list. (we'll essentially just have another list in the script dispatch 
> object for code that has been executed).
>
> This will allow to release the script by decreasing its state, 
> releasing the code that was executed already, while *keeping* the code 
> that is still pending, which I believe was the intent of the whole 
> thing. Again, I could be misunderstanding this.

I'm not sure. We don't really need the whole vbscode_t in script 
dispatch. For the variables, we just need just a name. For function, 
it's a name, public flag and argument count. The rest could be left in 
vbscode_t and be freed with it on script engine close. We could allocate 
that subset of data, for example, in a dedicated heap pool like we 
currently do for dynamic variables. We could do that in the compiler 
already.

>
> Secondly, link the compiled funcs/vars to the script dispatch object 
> during the execution phase of that vbscode_t, instead of during 
> compilation as we have now.

If tests confirm that, sure. If you don't want to deal with it, just 
leave it as is.

>
> Does this seem a good way forward?
>
> With this, the TypeInfo will just grab a reference to the script 
> dispatch (which it has to anyway due making use of the ident_map)

I'd rather get rid of ident_map, see bellow.

> and possibly create arrays for vars/funcs like a snapshot, so they 
> have the indices for the TypeInfo at the moment it is retrieved.

Once we have arrays that are don't change existing data, but only grow, 
then and all we need to capture the state is current element count.

>>
>> I don't know exactly how it should look like. It's even possible that 
>> we will have to do some copies to capture the state after all. If I 
>> was looking at it, I'd start with more experimentations. Your tests 
>> seem like a good start. It would be interesting to see how things 
>> change when you add more global members later. You tested that adding 
>> new global variables doesn't change the existing ITypeInfo. How does 
>> the new one look like? Does it append new variables after previous 
>> ones? What if we added a new "dynamic" (non-explicit) variable? What 
>> if we mix it all? How about functions? What do DISPIDs look like? Is 
>> it reasonable to try to replicate native DISPIDs? Once we know more 
>> answers, we may decide how the collection should look like.
>>
>
> After more fiddling here's some bits of info:
>
> The TypeInfo is frozen at the moment it is retrieved. Whatever happens 
> to the script dispatch after that should not affect it. To simplify 
> the explanation I will talk about "variables" but the same applies to 
> functions as well.
>
> If some variables are added and a new TypeInfo is retrieved, the new 
> TypeInfo will contain those new variables, and their indices will be 
> after the previous ones (i.e. they're appended).
>
> I don't think it's worth replicating DISPIDs, because those are 
> opaque. AFAIK no application should hard-code DISPIDs since they may 
> also change between Windows versions (from experiments it does seem 
> Windows tends to simply order the DISPIDs as they are found in the 
> script, though, but separates vars/funcs).
>
> However, the *indices* of the variables should probably be the same as 
> on Windows. This is because you can obtain variables by index using 
> GetVarDesc (again, same for functions). But, when it adds code, 
> explicit variables are added first, then non-explicit variables. If 
> more code is added, it will be appended to this, so they *can* be 
> mixed. (e.g. first code adds 2 explicit vars, then 1 implicit var, 
> second code adds 1 explicit var and 2 implicit, it will be: 
> e->e->i->e->i->i).

Cool, thanks for testing. It sounds like replacing the existing lists 
with arrays like that is what we need.

To get rid of ident_map, I'd suggest to change how DISPIDs work. DISPIDs 
could be just indexes in variable or function arrays, with a flag on one 
of high bits indicating which one is it.

Thanks,

Jacek