[PATCH 0/6] Extended file stat system call

Andreas Dilger adilger at dilger.ca
Fri Apr 27 14:31:07 CDT 2012


On 2012-04-27, at 7:13 AM, Dave Chinner wrote:
> Have a look at fs/xfs/xfs_dinode.h. There's a bunch of flags defined
> at the bottom of the file.
> 
> Stuff like the "nodefrag", "nodump", and "prealloc" bits seem fairly
> generic - they are for indicating that files are to be avoided for
> defrag or backup purposes, the prealloc bit indicates that fallocate
> has been used to reserve space on the inode (finding files that space
> can be punched out of safely), and so on.

There is already the FS_NODUMP_FL in the standard FS_IOC_GETFLAGS ioctl
and I expect this to be in statxat() also.  In ext4 there was also an
EXT4_EOFBLOCKS_FL added for inodes with fallocate'd data beyond EOF,
but Eric thought it was a pain to maintain and it has been deprecated
in ext4 and e2fsprogs recently.

> Currently these things are queried and manipulated by ioctls
> (XFS_IOC_FSX[GS]ETATTR) along with extent size hints, project
> quotas, etc. but I think there's some wider use for many of the
> flags, which is why I was asking is there's any thought to this sort
> of flag being exposed by the VFS.
> 
> Historically the flags exposed by the VFS are those used by extN - I
> see little reason why we should favour one filesystem's flags over
> any others in an extended stat interface if they are generically
> useful....

Sure, they started as ext4 flags because the "lsattr" and "chattr"
tools were using this ioctl/flags, but have become more generic in
recent years.  FS_NOTAIL_FL was added for Reiserfs, and FS_NOCOW_FL
was added for another filesystem (maybe Btrfs?).  I'm not against
adding more flags here that are generically useful, and recommended
that statxat() have a 64-bit st_ioc_flags, since there are already
22 FS_*_FL flags defined today.

>> Either you can add some of them to the ioc flags (which may be
>> impractical, I grant you) or we'd have to add an arbitrary fs-type
>> specific field and specify the host fs (the provision of which might
>> not be a bad idea in and of itself) to tell userspace how to interpret them.
> 
> Well, that's the complexity, isn't it. I have no good answer to
> that...
> 
>>> Along the same lines, filesytsems can have different allocation
>>> constraints to IO the filesystem block size - ext4 with it's
>>> bigalloc hack, XFS with it's per-inode extent size hints and the
>>> realtime device, etc. Then there's optimal IO characteristics
>>> (e.g. geometery hints like stripe unit/stripe width for the
>>> allocation policy of that given file) that applications could
>>> use if they were present rather than having to expose them
>>> through ioctls that nobody even knows about...
>> 
>> Yeah...  Not representable by one number.  You'd have to unset a
>> flag to say you were providing this information.
>> 
>> However, providing a whole bunch of hints about I/O characteristics
>> is probably beyond this syscall - especially if it isn't constant
>> over the length of a file.  That's specialist knowledge that most
>> applications don't need to know.  
>> Having a generic way to retrieve it, though, may be a good idea.
> 
> We're continually talking about applications giving us usage hints
> on what IO they are going to do so the storage can optimise the IO.
> IO is still a GIGO problem, though, and the idea of geometry hints
> is to enable us to tell the application to do well formed IO. i.e.
> less garbage.
> 
> XFS has ioctls to expose filesystem geometry, optimal IO sizes, the
> alignment limits for direct IO, etc, and they are very useful to
> applications that care about high performance IO. A lot of this can
> be distilled down to a simple set of geometries, and generally
> speaking they don't change mid way through a file....
> 
>> OTOH, there's plenty of uncommitted space, so if we can condense
>> the hints down to something small, we could perhaps add it later -
>> but from your paragraph above, it doesn't sound like it'll be small.
> 
> Allocation block size, minimum sane IO size (to avoid page cache RMW
> cycles or DIO zeroing), minimum prefered IO size (e.g. stripe unit),
> optimal IO size for bandwidth (e.g. stripe width). I don't think
> there's much more than that which will be really usable by
> applications.

I think this is a minimal set that makes sense, and is manageable for
both the interface and for users.  Even if it isn't 100% correct for
every file of every filesystem, it still makes sense for many systems.
I'd suggest st_frsize (like BSD statvfs() f_frsize) would be the
minimum fragment or page size, st_iosize (BSD f_iosize) could be
the optimal IO size, and "st_stripesize" for the minimum preferred RAID/chunk size.

One could argue that "st_blksize" is used for the "optimal IO size"
on Linux today, but this is an overloaded term.  It _appears_ to
represent the filesystem blocksize, which it usually is not, and on
BSD st_bsize means the minimum blocksize and has a confusingly
similar name.  Since any application using this API needs to do some
extra coding already, we may as well give the structure members good
names that are not ambiguous.

>>> Perhaps also exposing the project ID for quota purposes, like we
>>> do UID and GID. That way we wouldn't need a filesystem specific
>>> ioctl to read it....
>> 
>> Is this an XFS only thing?  If so, can it be generalised?
> 
> Right now it is, but there's been patches in the past to introduce
> project quotas to ext4. That didn't go far because it was done in a
> way that was semantically different to XFS (for no reason that I
> could understand) and nobody wanted two different sets of semantics
> for the "same" feature.  The most common use of project quotas is to
> implement sub-tree quotas, which is probably of more interest to
> btrfs folks as it is an exact match for per-subvolume quotas.
> 
> So, yes, I do see it as something generically useful - it's a
> feature that a lot of people use XFS specifically for....

I'd agree.  There was the tree quota project for ext4, and I've also
heard this is available in other filesystems.

Cheers, Andreas








More information about the wine-devel mailing list