CVS Statistics for academic research

Wed Apr 2 03:23:16 CST 2003

Hi all,
  this message was originally sent to Alexandre Juilliard; I am resending
this to the list on his suggestion.

I am a wine user and a regular mailing list lurker. Beside that, however,
I earn my living as an academic econometrician (halfway between an
economist and a statistician) and I was considering the idea of doing some
econometric research on open source development.

At least initially, I would focus on Wine, for a number or reasons, not
least the fact that the wine project is largely unknown in the economics
community. My problem is that the data I need for my analysis can be
theoretically extracted from the mailing list archives, but this
definitely exceeds my VERY limited perl skills.

The data I'd need would, ideally, be in the form of a CSV file, a typical
record of which would be

DATE,PN,PLA,PLD,PCN,CN,CLA,CLD

where

PN = number of patches received on day DATE
PLA = number of code lines added in patches
PLD = number of code lines deleted in patches
PCN = number of patches contributors (ie number of coders submitting
patches on that particular day)
CN = CVS commits
CLA = number of lines added in CVS
CLD = number of lines deleted in CVS

Of course, the longer the time span the data cover, the larger my sample,
the happier I am.

I understand this isn't trivial. Is there anyone out there willing to help
me? I know some perl, so maybe all I need is to get started somehow.

Thanks in advance, and keep up the good work!

----------------------------------------------------------

Riccardo `Jack' Lucchetti
Dipartimento di Economia
Università di Ancona

jack at dea.unian.it
http://www.econ.unian.it/lucchetti