Hm, it doesn't seem to be so simple.
Each page maintains an edit-log file with all the changes.

grep-ing for -i spam in the edit-log yields less than 400 hits.

Maybe we should look for deleted pages?


On 01/14/2013 01:35 PM, Francois Gouget wrote:
> On Mon, 14 Jan 2013, Dimi Paun wrote:
>> MoinMoin creates a dir for every page. I simply got the list
>> by listing these directories. (This is the problem -- there is a
>> limit of 2^15 subdirectories, and this is what we were hitting
>> a few days ago).
>> Does that answer the question?
> It feels like your methodology is flawed.
> Let's take the "Jaw crusher from joyal jc001j" page as an example.
> As far as I can tell that page has already been deleted. MoinMoin knows
> that. So you should not need us humans to waste time going through
> 20000+ rows of the spreadsheet to tell you that the directory
> corresponds to a deleted page.
> So is your problem that you want to preserve non-spam deleted pages?
> Can't a script go through these directories, notice that the page has
> been deleted, that the delete comment contains the word 'spam' and then
> delete the directory?

