Proposed interface for file operations by the file dialog code (dlls\commdlg\filedlg*.c)

Troy Rollo wine at troy.rollo.name
Mon Aug 15 21:55:35 CDT 2005


References:
	<http://www.winehq.org/hypermail/wine-patches/2005/08/0265.html>
	<http://www.winehq.org/hypermail/wine-devel/2005/08/0286.html>

Background:

	Work is currently proceeding on a branched version to create additional APIs
	for WINE that use UNIX path names rather than Windows ones. This is useful
	for Winelib apps and seeks to make them look more like they are native apps,
	thereby addressing some of the complaints that Winelib apps are somehow of
	lesser status than ports using other APIs. After some discussion, it was
	decided that this would remain a branch until at least such time as the
	implementation was proven to work in real-world situations.

	Part of this involves producing file dialog APIs that operate appropriately
	in this context. That includes taking UNIX path names on input, producing
	UNIX path names on output and in callbacks to the application, and browsing a
	heirarchy that does not include windows-isms such as drive letters.

	To make modifications directly in the existing code would result in a set of
	differences that would result in significant headaches for branch maintenance
	and any future merge-back to WineHQ. The objective is to reduce the
	differences so as to improve  compatibility between the branches.

	The patch referenced above took a minimal-change approach to this problem by
	implementing an interface that mostly implemented the small operations made
	by the existing code, without even putting in local stubs (hence the
	inconsistent calling conventions in the interface).

	The general principle I have used is that path names should, as far as
	possible, be opaque so that the file dialog code itself never examines their
	contents directly, but rather calls functions in the interface to extract or
	locate particular portions of the path, to modify or concatenate paths or to
	make use of the paths.

The minimalist interface:

	typedef struct
	{
		UINT	code_page;
		WCHAR	sep_char;

		HRESULT WINAPI (*get_top_folder)	(IShellFolder **);
		LPITEMIDLIST	(*get_pidl_from_name)(IShellFolder *, LPWSTR);
		BOOL		(*get_display_name)	(LPCITEMIDLIST, LPWSTR);
		IShellFolder* 	(*get_folder_from_pidl)(LPITEMIDLIST);

		BOOL	WINAPI	(*change_directory)	(LPCWSTR);
		UINT	WINAPI	(*get_directory)	(UINT, LPWSTR);

		void		(*qualify_path)		(LPWSTR, LPCWSTR);
		void		(*complete_path)	(LPWSTR);
		BOOL		(*has_invalid_char)	(LPCWSTR);
		LPWSTR	WINAPI	(*find_next_component)	(LPCWSTR);
		LPWSTR	WINAPI	(*find_file_name)	(LPCWSTR);
		LPWSTR	WINAPI	(*find_extension)	(LPCWSTR);
		BOOL	WINAPI	(*file_exists)		(LPCWSTR);
		BOOL	WINAPI	(*is_directory)		(LPCWSTR);
		LPWSTR	WINAPI	(*add_dir_sep)		(LPWSTR);
		DWORD	WINAPI	(*get_full_path)	(LPCWSTR, DWORD, LPWSTR, LPWSTR*);

	} FileDlgFileOps;

Minimalist interface vs ideal interface:

	On IRC this morning Alexandre said he would prefer a well-designed interface
	to the minimalist approach, hence this discussion.

	Since the interface is entirely internal to commdlg, I will use cdecl calling
	conventions.

Detailed discussion follows.

The code_page member was put there because the UNIX file name APIs may not use 
the same code page as other A entry points. WINE uses CP_ACP (as does Windows 
- although CP_THREAD_ACP is subject to further investigation) for most 
purposes, but CP_UNIXCP for UNIX path names when translating them to UTF16. 
Often CP_UNIXCP will be something like UTF8, or it may be ISO8859-1 in 
situations where Windows would use CP1252.  It may be that a Winelib 
application should have CP_ACP set to be the UNIX code page, but they may not 
(and unless they do something special will not.

So places where A->W or W->A conversions happen on path names need to make 
sure they use the right code page in the context. The minimalist approach was 
to make this a data member, but the ideal approach would be to have functions 
which performed the appropriate conversions. Looking through the existing 
code, the conversions are performed in some contexts where the output is 
allocated, and others where the output is a fixed size buffer. If we want to 
have just one conversion function for each direction, this would give us:

	CHAR *filename_wtoa(WCHAR const *in, CHAR *out, int bufrange);
	WCHAR *filename_atow(CHAR const *in, WCHAR *out, int bufrange);

	"in" is the input buffer.
	"out" is the output buffer (NULL if we want the method to allocate).
	"bufrange" is the number of elements pointed to by "out".

	The return value is a pointer to the file name on success, and is either the
	value of "out", or an allocated buffer where "out" is NULL. On failure the
	return value is NULL.

The sep_char is perhaps the rudest part of the minimalist interface since it 
does not treat the path names as opaque. It is used in the following 
contexts: The handling of CDM_GETFILEPATH, where it is used to paste the file 
name and directory name together; and in FILEDLG95_InitControls, where it is 
used to determine if the input file name has a path component.

With an ideal interface the CDM_GETFILEPATH handling would be changed to use a 
general path qualification function. Determining if the input file name has a 
path component could be handled in one of two ways: with a method for 
querying this; or by searching for the start of the file name component and 
testing if that is the start of the string [ie. find_file_name(input) != 
input].

I prefer the latter method since it means one less entry in the interface, but 
if the boolean function were preferred it might appear as:

	BOOL	has_path(char const *filename);

get_top_folder exists because the Windows path versions of the dialog use the 
Desktop folder as their top level, but a UNIX path version should arguably 
use the UNIX root as its top level. Operations retrieving the top level 
folder (SHGetDesktopFolder) appear in many places. I would prefer to return 
the pointer though, hence:

	IShellFolder *get_top_folder(void);

The next three functions in the minimal interface (get_pidl_from_name; 
get_display_name; and get_folder_from_pidl) are functions that are already 
implemented for Windows path names and are used to handle conversions between 
item ID lists and path names. Unless somebody thinks the existing 
implementations are in need of reworking, I don't see any reason not to 
include them as is in the interface.

The next two functions (change_directory; and get_directory) are currently 
direct calls to SetCurrentDirectoryW and GetCurrentDirectoryW in the default 
implementation of the interface. In the UNIX path versions there is some 
difficulty in how these should be handled for reasons that are too complex to 
go into here, but by having the interface at least a start can be made on 
figuring out how to deal with these. GetCurrentDirectoryW is usually called 
in contexts where the buffer is allocated (albeit wastefully), but in one 
case is called on a stack buffer. I am inclined to have the stack buffer 
replaced by an allocated one, hence:

	BOOL	set_directory(WCHAR const *dir);
	WCHAR *get_directory(void);

Next comes qualify_path, which is used to generate a fully qualified and 
canonical (no '/../' sequences) path name given a directory name and a 
(possibly already fully qualified) path name. The minimalist declaration is 
based on the way this operation was implemented in FILEDLG95_OnOpen, but in 
accordance with my preference for allocating string buffers I would prefer:

	WCHAR *qualify_path(WCHAR const *path, WCHAR const *dir);

	It may be that if dir == NULL the function would use the current directory.

complete_path is only used in one place - in FILEDLG95_OnOpen where the 
routine walks through its path elements, and tacks on a trailing backslash to 
paths like "c:" (which will be the first component of the path for "c:
\windows\system.ini"). find_next_component (currently set to 
PathFindNextComponentW) is only used in that same context, so perhaps the 
better solution is to combine these two with:

	WCHAR *next_component(WCHAR const *path, WCHAR const *last);

	"last" is the most recent return value (or NULL for the first call).
	"path" is the input path.

	The return value is an allocated string containing the next path component,
	so you would get "c:\", "windows", "system.ini" as return values.

has_invalid_char is used to test if the path name contains any invalid 
characters. The current default implementation is wrong, but reflects what 
was already there. The general rule is that Win32 paths (at least in the file 
dialog) should not contain '/', ':' (except as a drive letter (*)), '<'. '>', 
and '|'. IIRC, wild cards are forbidden in file names, but the filedlg code 
does not treat them as invalid because they are valid for entry into the edit 
box of the file dialog. Under UNIX, there are no invalid file name characters 
- any string is a valid file name although it may be a relative file name. I 
would keep this function but rename it:

	BOOL *valid_file_name(WCHAR const *filename);

(*) can the file dialogs address stream names under NT? Is it meaningful to do 
the same under Wine since UNIX has no concept of file sub-streams?

find_file_name and find_extension find the final path component, and the 
extension (if any) in the file name. These are fairly pure functions for 
handling otherwise opaque path names, so they would remain as is but without 
the WINAPI calling convention:

	WCHAR *find_file_name(WCHAR *filename);
	WCHAR	*find_extension(WCHAR *filename);

Next come file_exists and is_directory. These could be replaced by a single 
routine that requests the type of the file and returns -1 if the file does 
not exist:

	int get_file_type(WCHAR const *filename);

	Returns (with optional symbolic constants):

		-1: no such file
		0: ordinary file
		1: directory

This would also simplify some other code where is_directory is called 
immediately after file_exists.

add_dir_sep is used when pasting a path name and a wildcard string. If there 
are no objections, qualify_path could be used for such situations, thereby 
avoiding the need for a separate add_dir_sep.

get_full_path is currently a call to GetFullPathNameW. It is used in 3 places 
in FILEDLG95_InitControls. Each time it is used to do three things: 1. 
Convert any 8.3 filenames to the long path name version, then extract the 
file name portion (if any) and directory name portion of the resulting long 
path, storing them in separate locations. I would simplify the call, 
allocating the result:

	WCHAR *get_full_path(WCHAR const *path);

The file name component would then be discovered by a call to find_file_name.

Obvious candidates for changes:

1. find_extension could be implemented using find_file_name in a way that 
obviates the need for a separate find_extension.
2. Might find_file_name be implemented in terms of next_component? This likely 
depends on the behaviour or next_component with a path like "f:\windows\" - 
if it only returns "f:\" and "windows" then it would not be suitable, but if 
it returns "f:\", "windows" and "" then it would.

Proposed interface:

	typedef struct
	{
		CHAR *(*filename_wtoa)(WCHAR const *in, CHAR *out, int bufrange);
		WCHAR *(*filename_atow)(CHAR const *in, WCHAR *out, int bufrange);

		IShellFolder *(*get_top_folder)(void);
		LPITEMIDLIST (*get_pidl_from_name)(IShellFolder *, LPWSTR);
		BOOL (*get_display_name)(LPCITEMIDLIST, LPWSTR);
		IShellFolder* 	(*get_folder_from_pidl)(LPITEMIDLIST);

		BOOL (*set_directory)(WCHAR const *dir);
		WCHAR *(*get_directory)(void);
		int (*get_file_type)(WCHAR const *filename);

		WCHAR *(*qualify_path)(WCHAR const *path, WCHAR const *dir);
		WCHAR *(*get_full_path)(WCHAR const *filename);

		WCHAR *(*next_component)(WCHAR const *path, WCHAR const *last);
		WCHAR *(*find_file_name)(WCHAR *filename);
		WCHAR	*(*find_extension)(WCHAR *filename);

		BOOL *(*valid_file_name)(WCHAR const *filename);
	} FileDlgFileOps;



More information about the wine-devel mailing list