FAQ update
Francois Gouget
fgouget at free.fr
Fri Sep 5 19:28:42 CDT 2003
On 22 Aug 2003, Jeremy Newman wrote:
[...]
> - No <html><head><body> tags. Just the content. ie, everything that
> would be between the <body> tags.
I have a Perl script that does that part. Should be easy to extend to
also extract the title...
Here goes as a starting point. Maybe I'll work a bit more on it
tomorrow but if anyone feels like hacking on it, feel free!
#!/usr/bin/perl -w
use strict;
use File::Copy;
my $filename=$ARGV[0];
print " $filename\n";
#FIXME:assuming that because there is a .bak file, this is what we want is
#probably flawed. Or is it???
if (! -e "$filename.bak")
{
if (!copy("$filename","$filename.bak"))
{
print STDERR "error: unable to make a backup of $filename:\n";
print STDERR " $!\n";
return;
}
}
if (!open(FILEI,"$filename.bak"))
{
print STDERR "error: unable to open $filename.bak for reading:\n";
print STDERR " $!\n";
return;
}
if (!open(FILEO,">$filename"))
{
print STDERR "error: unable to open $filename for writing:\n";
print STDERR " $!\n";
return;
}
my $line;
while ($line=<FILEI>)
{
if ($line =~ s/<body[^>]*>//i)
{
print "matched <body>: $line";
last;
}
elsif ($line =~ s/<body[^>]*$//i)
{
print "matched <body: $line";
while ($line=<FILEI>)
{
print "looking for > $line";
if ($line =~ s/^[^>]*>//i)
{
last;
}
}
last;
}
}
print FILEO $line;
while ($line=<FILEI>)
{
if ($line =~ s/<\/body//i)
{
print FILEO $line;
last;
}
print FILEO $line;
}
close FILEI;
close FILEO;
exit 0;
--
Francois Gouget fgouget at free.fr http://fgouget.free.fr/
"Lotto: A tax on people who are bad at math." -- unknown
"Windows: Microsoft's tax on computer illiterates." -- WE7U
More information about the wine-devel
mailing list