[PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector

Ricardo Neri ricardo.neri-calderon at linux.intel.com
Thu Jun 15 13:37:51 CDT 2017


On Tue, 2017-05-30 at 12:35 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:08AM -0700, Ricardo Neri wrote:
> > When computing a linear address and segmentation is used, we need to know
> > the base address of the segment involved in the computation. In most of
> > the cases, the segment base address will be zero as in USER_DS/USER32_DS.
> > However, it may be possible that a user space program defines its own
> > segments via a local descriptor table. In such a case, the segment base
> > address may not be zero .Thus, the segment base address is needed to
> > calculate correctly the linear address.
> > 
> > The segment selector to be used when computing a linear address is
> > determined by either any of segment override prefixes in the
> > instruction or inferred from the registers involved in the computation of
> > the effective address; in that order. Also, there are cases when the
> > overrides shall be ignored (code segments are always selected by the CS
> > segment register; string instructions always use the ES segment register
> > along with the EDI register).
> > 
> > For clarity, this process can be split into two steps: resolving the
> > relevant segment register to use and, once known, read its value to
> > obtain the segment selector.
> > 
> > The method to obtain the segment selector depends on several factors. In
> > 32-bit builds, segment selectors are saved into the pt_regs structure
> > when switching to kernel mode. The same is also true for virtual-8086
> > mode. In 64-bit builds, segmentation is mostly ignored, except when
> > running a program in 32-bit legacy mode. In this case, CS and SS can be
> > obtained from pt_regs. DS, ES, FS and GS can be read directly from
> > the respective segment registers.
> > 
> > Lastly, the only two segment registers that are not ignored in long mode
> > are FS and GS. In these two cases, base addresses are obtained from the
> > respective MSRs.
> > 
> > Cc: Dave Hansen <dave.hansen at linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder at gmail.com>
> > Cc: Colin Ian King <colin.king at canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes at gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren at intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme at redhat.com>
> > Cc: Masami Hiramatsu <mhiramat at kernel.org>
> > Cc: Adrian Hunter <adrian.hunter at intel.com>
> > Cc: Kees Cook <keescook at chromium.org>
> > Cc: Thomas Garnier <thgarnie at google.com>
> > Cc: Peter Zijlstra <peterz at infradead.org>
> > Cc: Borislav Petkov <bp at suse.de>
> > Cc: Dmitry Vyukov <dvyukov at google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar at intel.com>
> > Cc: x86 at kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon at linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 256 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 256 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 1634762..0a496f4 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -9,6 +9,7 @@
> >  #include <asm/inat.h>
> >  #include <asm/insn.h>
> >  #include <asm/insn-eval.h>
> > +#include <asm/vm86.h>
> >  
> >  enum reg_type {
> >  	REG_TYPE_RM = 0,
> > @@ -33,6 +34,17 @@ enum string_instruction {
> >  	SCASW_SCASD	= 0xaf,
> >  };
> >  
> > +enum segment_register {
> > +	SEG_REG_INVAL = -1,
> > +	SEG_REG_IGNORE = 0,
> > +	SEG_REG_CS = 0x23,
> > +	SEG_REG_SS = 0x36,
> > +	SEG_REG_DS = 0x3e,
> > +	SEG_REG_ES = 0x26,
> > +	SEG_REG_FS = 0x64,
> > +	SEG_REG_GS = 0x65,
> > +};
> 
> Yuck, didn't we talk about this already?

I am sorry Borislav. I thought you agreed that I could use the values of
the segment override prefixes to identify the segment registers [1].
> 
> Those are segment override prefixes so call them as such.
> 
> #define SEG_OVR_PFX_CS	0x23
> #define SEG_OVR_PFX_SS	0x36
> ...
> 
> and we already have those!
> 
> arch/x86/include/asm/inat.h:
> ...
> #define INAT_PFX_CS     5       /* 0x2E */
> #define INAT_PFX_DS     6       /* 0x3E */
> #define INAT_PFX_ES     7       /* 0x26 */
> #define INAT_PFX_FS     8       /* 0x64 */
> #define INAT_PFX_GS     9       /* 0x65 */
> #define INAT_PFX_SS     10      /* 0x36 */
> 
> well, kinda, they're numbers there and not the actual prefix values.

These numbers can 'translated' to the actual value of the prefixes via
inat_get_opcode_attribute(). In my next version I am planning to use
these function and reuse the aforementioned definitions.

> 
> And then there's:
> 
> arch/x86/kernel/uprobes.c::is_prefix_bad() which looks at some of those.
> 
> Please add your defines to inat.h

Will do.

> and make that function is_prefix_bad()
> use them instead of naked numbers. We need to pay attention to all those
> different things needing to look at insn opcodes and not let them go
> unwieldy by each defining and duplicating stuff.

I have implemented this change and will be part of my next version.
> 
> >  /**
> >   * is_string_instruction - Determine if instruction is a string instruction
> >   * @insn:	Instruction structure containing the opcode
> > @@ -83,6 +95,250 @@ static bool is_string_instruction(struct insn *insn)
> >  	}
> >  }
> >  
> > +/**
> > + * resolve_seg_register() - obtain segment register
> 
> That function is still returning the segment override prefix and we use
> *that* to determine the segment register.

Once I add new definitions for the segment registers and reuse the
existing definitions of the segment override prefixes this problem will
be fixed.

> 
> > + * @insn:	Instruction structure with segment override prefixes
> > + * @regs:	Structure with register values as seen when entering kernel mode
> > + * @regoff:	Operand offset, in pt_regs, used to deterimine segment register
> > + *
> > + * The segment register to which an effective address refers depends on
> > + * a) whether segment override prefixes must be ignored: always use CS when
> > + * the register is (R|E)IP; always use ES when operand register is (E)DI with
> > + * string instructions as defined in the Intel documentation. b) If segment
> > + * overrides prefixes are used in the instruction instruction prefixes. C) Use
> > + * the default segment register associated with the operand register.
> > + *
> > + * The operand register, regoff, is represented as the offset from the base of
> > + * pt_regs. Also, regoff can be -EDOM for cases in which registers are not
> > + * used as operands (e.g., displacement-only memory addressing).
> > + *
> > + * This function returns the segment register as value from an enumeration
> > + * as per the conditions described above. Please note that this function
> > + * does not return the value in the segment register (i.e., the segment
> > + * selector). The segment selector needs to be obtained using
> > + * get_segment_selector() and passing the segment register resolved by
> > + * this function.
> > + *
> > + * Return: Enumerated segment register to use, among CS, SS, DS, ES, FS, GS,
> > + * ignore (in 64-bit mode as applicable), or -EINVAL in case of error.
> > + */
> > +static enum segment_register resolve_seg_register(struct insn *insn,
> > +						  struct pt_regs *regs,
> > +						  int regoff)
> > +{
> > +	int i;
> > +	int sel_overrides = 0;
> > +	int seg_register = SEG_REG_IGNORE;
> > +
> > +	if (!insn)
> > +		return SEG_REG_INVAL;
> > +
> > +	/* First handle cases when segment override prefixes must be ignored */
> > +	if (regoff == offsetof(struct pt_regs, ip)) {
> > +		if (user_64bit_mode(regs))
> > +			return SEG_REG_IGNORE;
> > +		else
> > +			return SEG_REG_CS;
> > +		return SEG_REG_CS;
> 
> Simplify:
> 
> 		if (user_64bit_mode(regs))
> 			return SEG_REG_IGNORE;
> 
> 		return SEG_REG_CS;

Will do.
> 
> > +	}
> > +
> > +	/*
> > +	 * If the (E)DI register is used with string instructions, the ES
> > +	 * segment register is always used.
> > +	 */
> > +	if ((regoff == offsetof(struct pt_regs, di)) &&
> > +	    is_string_instruction(insn)) {
> > +		if (user_64bit_mode(regs))
> > +			return SEG_REG_IGNORE;
> > +		else
> > +			return SEG_REG_ES;
> > +		return SEG_REG_CS;
> 
> What is that second return actually supposed to do?

This is not correct and I will remove it. Actually, will never run due
to the if/else above it. Thanks for noticing it.
> 
> > +	}
> > +
> > +	/* Then check if we have segment overrides prefixes*/
> 
> Missing space and fullstop: "... overrides prefixes. */"

Will fix.

> 
> > +	for (i = 0; i < insn->prefixes.nbytes; i++) {
> > +		switch (insn->prefixes.bytes[i]) {
> > +		case SEG_REG_CS:
> > +			seg_register = SEG_REG_CS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_SS:
> > +			seg_register = SEG_REG_SS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_DS:
> > +			seg_register = SEG_REG_DS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_ES:
> > +			seg_register = SEG_REG_ES;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_FS:
> > +			seg_register = SEG_REG_FS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_GS:
> > +			seg_register = SEG_REG_GS;
> > +			sel_overrides++;
> > +			break;
> > +		default:
> > +			return SEG_REG_INVAL;
> 
> So SEG_REG_NONE or so? It is not invalid if it is not a segment override
> prefix.

Right, we can have more prefixes. We should need a default action as we
are only looking for the segment override prefixes, as you mention.

> 
> > +	/*
> > +	 * Having more than one segment override prefix leads to undefined
> > +	 * behavior. If this is the case, return with error.
> > +	 */
> > +	if (sel_overrides > 1)
> > +		return SEG_REG_INVAL;
> 
> Yuck, wrapping of -E value in a SEG_REG enum. Just return -EINVAL here
> and make the function return an int, not that ugly enum.

Will do.
> 
> And the return convention should be straight-forward: default segment if
> no prefix or ignored, -EINVAL if error and the actual override prefix if
> present.

Wouldn't this be ending up mixing the actual segment register and
segment register overrides? I plan to have a function that parses the
segment override prefixes and returns SEG_REG_CS/DS/ES/FS/GS or
SEG_REG_IGNORE for long mode or SEG_REG_DEFAULT when the default segment
register needs to be used. A separate function will determine what such
default segment register is. Does this make sense?

> 
> Also, that test should be *after* the user_64bit_mode() because in long
> mode, segment overrides get ignored. IOW, those three if-tests around here
> should be combined into a single one, i.e., something like this:
> 
> 	if (64-bit) {
> 		if (!FS || !GS)
> 			ignore
> 		else
> 			return seg_override_pfx;	<--- Yes, that variable should be called seg_override_pfx to denote what it is.

Perhaps it can return what I have described above?

> 	} else if (sel_overrides > 1)
> 		-EINVAL
> 	else if (sel_overrides)
> 		return seg_override_pfx;
> 

Will re-do these tests are you mention.
> > +
> > +	if (sel_overrides == 1) {
> > +		/*
> > +		 * If in long mode all segment registers but FS and GS are
> > +		 * ignored.
> > +		 */
> > +		if (user_64bit_mode(regs) && !(seg_register == SEG_REG_FS ||
> > +					       seg_register == SEG_REG_GS))
> > +			return SEG_REG_IGNORE;
> > +
> > +		return seg_register;
> > +	}
> > +
> > +	/* In long mode, all segment registers except FS and GS are ignored */
> > +	if (user_64bit_mode(regs))
> > +		return SEG_REG_IGNORE;
> > +
> > +	/*
> > +	 * Lastly, if no segment overrides were found, determine the default
> > +	 * segment register as described in the Intel documentation: SS for
> > +	 * (E)SP or (E)BP. DS for all data references, AX, CX and DX are not
> > +	 * valid register operands in 16-bit address encodings.
> > +	 * -EDOM is reserved to identify for cases in which no register is used
> > +	 * the default segment register (displacement-only addressing). The
> > +	 * default segment register used in these cases is DS.
> > +	 */
> > +
> > +	switch (regoff) {
> > +	case offsetof(struct pt_regs, ax):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, cx):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, dx):
> > +		if (insn && insn->addr_bytes == 2)
> > +			return SEG_REG_INVAL;
> > +	case offsetof(struct pt_regs, di):
> > +		/* fall through */
> > +	case -EDOM:
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, bx):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, si):
> > +		return SEG_REG_DS;
> > +	case offsetof(struct pt_regs, bp):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, sp):
> > +		return SEG_REG_SS;
> > +	case offsetof(struct pt_regs, ip):
> > +		return SEG_REG_CS;
> > +	default:
> > +		return SEG_REG_INVAL;
> > +	}
> 
> So group all the fall through cases together so that you don't have this
> dense block of code with "/* fall through */" on every other line.

Will do.

Thanks and BR,
Ricardo




More information about the wine-devel mailing list