Documentation of Parallel and Serial port configuration?

Fri Oct 7 15:33:02 CDT 2005

> > > we can probably do better than inb() / outb().
> > You can't do any better than that [It] is the only one that makes sense
> > (when you run things on ia32).
> ... and when you're not on an ia32 platform with a superIO chip?

That's a moot point. Then you have to emulate ia32 to run windows programs 
anyway.

> > > Advantages of using ppdev over simple inb() / outb() are:
> > >
> > >   should support [*] cross-architecture (arm, alpha, powerpc, ...)
> >
> > That'd be good for winelib only or wine-with-emulator (bochs? qemu?).
>
> Yup, both.  A ported applications (via winelib or qemu) should work under
> any Linux architecture.  Unfortunately, it would be a Linux-specific
> solution; *-BSDs have their own interface.

If you have access to the source it's simpler to write a native device driver. 
The only reason d'etre of running legacy windows apps that bit-bang things 
via direct port i/o is that there's no documentation for the hardware.

> > >   should support [*] some esoteric devices (USB-parallel converters,
> > > ...)
> >
> > At a huge performance penalty ;)
>
> But it would work, 's my point.  The performance of parallel-over-USB is a
> separate issue.
>
> Legacy devices (such as parallel ports) are being gradually faded out.  So
> writing code that requires a SuperIO chip is not best.

Sure. But then you're talking about the driver code, not user code.

I think this discussion needs some clarification, as there are distinct use 
cases:

Case 1. Running legacy ia32 windows code that does port I/O on ia32.
Case 2 : Running such legacy code on non-ia32.
Case 3. Porting such apps over to winelib when the source code is available.

For (1) ioperm is the only reasonable solution. The hardware is there for you 
to do the job, so why bother emulating existing functionality.

For (2) you are doing instruction-by-instruction emulation, or at least block 
translation anyway, i.e. wine running with an ia32 emulator. So it doesn't 
really matter what you do, as there's no ia32 and likely no direct access to 
the 'ports' on target hardware. If the target hardware is a serial/parallel 
port, the motherboard chipset is likely to implement it in a wholly different 
way (or not at all!). The only chance for 1:1 plug-in compatibility is when 
target hardware is a PCI card. I don't know how many direct-IO win32 apps are 
there for PCI cards that are relevant enough to be moved to a more up-to-date 
non-ia32 hardware.

For (3), the simplest thing to do is to tear off the driver code and port it 
to the platform's driver infrastructure. Using winelib for any sort of 
emulation there is a waste of resources.

> > AFAIK, the overhead stems from the fact that instead of a machine
> > instruction you have to:
> > - process an exception in the kernel, which then signals SIGSEGV to the
> > process
> > - invoke the signal handler
> > - determine what's up and disassemble the instruction at CS:EIP
> > - invoke a function/syscall based on the disassembled instruction
> >
> > If this isn't dog slow, I don't know what is. I wasn't entirely clear,
> > the syscall is the least of our worries in fact :)
>
> I think you may be confusing some other activity (maybe an invalid memory
> access?). 

How's an invalid memory access different from a legacy win32 application 
trying to do direct port IO without sufficient privileges?

1. You have an IN or OUT opcode in the code. When ioperms are not set, the CPU 
raises an exception.
2. That exception results in SIGSEGV being signalled to the process. 
3. The signal handler has to determine what type of an exception is it.  I 
don't recall whether the exception data has information about the type of 
exception, but in any event the opcode has to be disassembled just to get the 
port address and data operands. 
4. Then in all likelihood the operands are not immediate constants, so the 
relevant registers have to be interrogated from the saved below-signal 
process context. 
5. A port-to-filehandle look up has to be performed, in order to determine 
which open device has to handle that port I/O.
6. An ioctl has to be invoked on that handle.
7. Kernel's ioctl machinery is invoked, and the ioctl percolates to the driver 
code. Remember that ioctl requests can be handled by different layers of the 
driver stack, so it has literally to go the slowest path, through all the 
intermediate layers down to the driver. That's a bunch of switch statements 
to go through.
8. An actual port I/O is done by the driver, in case of 1:1 mapping, or a 
request is put to the bus subsystem if a legacy device (like a parallel port) 
is being handled via a usb/firewire "converter".

This code path is on the order of thousands of instructions long. If you have 
a legacy ia32 app running under wine on ia32 hardware, there's no point in 
doing all that in place of a simple port I/O.

> A syscall is pretty simple.  

Of course. But it's not "just about a syscall". And besides, even a syscall 
that has just a "ret" on the kernel side is still equivalent to say 40-50 
"average" instructions.

> > At the cost of slowing things down. For devices that bit bang data (like
> > programmers), this makes things unacceptably slow.
>
> I can't say I share that experience (about being unacceptably slow, that
> is). A 40-60% increase in overhead for a single instruction would be
> definitely noticeable, 

You're talking about a 3 orders of magnitude slow down, not a 40-60% increase.

> but only if this is the bottleneck in the program.  

When you do bit-banging, that's all that counts. The least a bit-banging 
application needs to do is to have two port accesses per bit (to flip the 
clock back and forth). Suppose you need to move 10 kilobytes in that fashion. 
That's 160 thousand travels through the points 1-8 described above. That'd be 
some 160 million equivalent instructions, assuming that the 1-8 path takes as 
much as a 1000 "average" instructions. How does 160 million instructions 
compare to say 50 thousand?

Putting that in real-life terms, if you're bit-banging to a programmable logic 
device that's on a PCI card, when you interleave bit-banging code with parse 
logic that works on a jedec file, you can easily get 5 bits/us on modern 
hardware.

Now, I'm not saying that there are any useful legacy windows apps that do just 
that and that would benefit from wine being fast in that respect.

What I'm saying is that for case (1) from the first list above, the ioperm() 
solution is the only one that makes sense. For other cases, you very likely 
have to do some hardware emulation anyway, and then it's not general enough 
to go into wine anyway. Maybe emulating the old-and-good parallel port would 
be sensible, but that's about as far as one can sensibly go.

> The worse-case would be something driving the parallel port as a
> square-wave generator: you'd get the full 40-60% drop in performance
> (assuming all the above numbers).

Last time I checked, I could do about 5 million 16-bit writes per second onto 
an I/O port on a 5V, 32MHz PCI card sitting in my otherwise idle and not very 
new PIII system. Maybe I was just very lucky, then, but I don't buy the 
argument that executing steps 1-8 above can be done 5 million times a second.

The real question is: what applications are out there that need "hardware" 
support in wine. One likely class is applications with usb minidrivers. Those 
should be easy to handle via libusb, after implementing the necessary WDM 
functionality in userspace (in wine). Presumably some code for that can be 
leeched from ndiswrapper and any similar projects.

Cheers, Kuba