Expression parser and floats/ints

mhisted's Avatar

mhisted

09 May, 2012 08:01 PM

Hi Chris,

Are functions like round() and int() available in the MWorks expression parser? I want to do some casting between floating point values and boolean (i.e. valueOfOneOrTwo = (rand(1) > 0.5) + 1).

I'm wondering whether I have to worry about floating point errors on comparisons like this -- i.e. the old "0.1 == 10.0/1 is false on some machines" issue. With a round/int/floor function I could do this explicitly.

Thank you,
Mark

  1. Support Staff 1 Posted by Christopher Sta... on 10 May, 2012 02:10 PM

    Christopher Stawarz's Avatar

    Hi Mark,

    The expression parser supports C-style type casts of the form (type)value. The supported types are bool; signed integer types char, short, int (aka integer), and long; unsigned integer types byte, word, dword, and qword; floating-point types float and double; and string.

    You can use casts to truncate a float to an integer. For example, (int)1.9 evaluates to 1.

    Cheers,
    Chris

  2. 2 Posted by Mark Histed on 10 May, 2012 02:49 PM

    Mark Histed's Avatar

    Thanks, Chris. I figured out parts of this from the online version of the STX parser:
    https://idlebox.net/2007/stx-exparser/online.htt

    One final question - the parser clearly maintains a type for all numbers. I found that the server will exit on an assertion if you try to add a number to an uncasted value of type boolean. Do you use a float/double internally for all scalar numeric consts or does MWorks use a mix of ints/floats?

    Thank you
    Mark

  3. Support Staff 3 Posted by Christopher Sta... on 10 May, 2012 03:45 PM

    Christopher Stawarz's Avatar

    One final question - the parser clearly maintains a type for all numbers. I found that the server will exit on an assertion if you try to add a number to an uncasted value of type boolean. Do you use a float/double internally for all scalar numeric consts or does MWorks use a mix of ints/floats?

    Yeah, the expression parser (via the stx::AnyScalar class) forbids arithmetic operations involving booleans. However, if the boolean value is coming from an MWorks variable, then the expression parser sees it as an integer, so arithmetic is allowed.

    For example, if bool_var is an MWorks variable of type "boolean" with a default value of "true", then bool_var + 2 is valid and evaluates to 3 (see attached example). However, (bool)bool_var + 2 will fail to parse and produce a "No binary operators are allowed on bool values" error. Whether this distinction is wise or helpful is another question.

    Internally, both the expression parser and MWorks (via the mw::Datum class) maintain type info for all values. In particular, both distinguish between booleans, integers, and floats.

    Cheers,
    Chris

  4. 4 Posted by Mark Histed on 10 May, 2012 06:54 PM

    Mark Histed's Avatar

    Great - this is good to know; for now I'm going to be using explicit casts.

    > Internally, both the expression parser and MWorks (via the `mw::Datum` class) maintain type info for all values. In particular, both distinguish between booleans, integers, and floats.

    But didn't you say that variables in the XML ignore whether they are type="integer" or type="float"?
    This is important if expressions like intVariable/2 don't automatically cast the int to a float.
    Mark

  5. Support Staff 5 Posted by Christopher Sta... on 11 May, 2012 01:43 PM

    Christopher Stawarz's Avatar

    But didn't you say that variables in the XML ignore whether they are type="integer" or type="float"? This is important if expressions like intVariable/2 don't automatically cast the int to a float.

    Variables themselves don't care and will happily store values of any type. However, the "type" parameter does matter to the XML parser, which uses it to decide how the text in the "default_value" parameter is converted to a Datum instance. So if intVariable has type="integer", then intVariable/2 will indeed result in integer (i.e. truncating) division.

    Chris

  6. 6 Posted by Mark Histed on 11 May, 2012 07:17 PM

    Mark Histed's Avatar

    Hi Chris,
    Do you think you could document in more detail how types are assigned and converted by MWorks?
    You seem to be saying the parser respects the supplied type. Can the variable type then change based on e.g. "assignment" actions? If so how does MWorks detect the type of a literal, or of an expression? What about on save and restore of variables?
    What about passing to the matlab and python bridges?

    Sorry - I now realize this is more complicated than I had assumed and people that write XML probably need to know the type assignment and casting rules. No rush on this, my current code works.

    Thank you
    Mark

  7. 7 Posted by Dave Cox on 09 Nov, 2012 03:26 PM

    Dave Cox's Avatar

    Hello,

    Just wanted to revive this thread for discussion since one of my students is getting bitten by type issues like this.

    The problem arises when you assign a value, such as "0" to a variable that is marked as a float. This action "demotes" the variable to an integer, which isn't necessarily a problem, but it can have weird consequences (e.g. if it participating elsewhere in a division... suddenly what was a float division is now a truncating integer division). Very confusing and sometimes hard to reproduce, especially if that "0" gets entered from from the client. Simple test case attached.

    If those variable "type" fields are going to be binding in some contexts, perhaps we should make them binding in all contexts? This would simply require enforcing types whenever an assignment is made, and should be a fairly surgical change. I'm not sure if we're deriving an benefit from having numerical values be loosely typed like this.

    Thoughts?
    - Dave

  8. 8 Posted by Mark Histed on 09 Nov, 2012 06:47 PM

    Mark Histed's Avatar

    Aha. This may describe what has bitten me in the past.

    I'm in favor of specifying the variable type in the XML and keeping it the same throughout the lifetime of the variable.

    Mark

  9. Support Staff 9 Posted by Christopher Sta... on 09 Nov, 2012 08:26 PM

    Christopher Stawarz's Avatar

    Hi Dave & Mark,

    If those variable "type" fields are going to be binding in some contexts, perhaps we should make them binding in all contexts?

    I'd like to point out again that the "type" field matters only when parsing the default value of the variable. The declared type of should_be_a_float_variable in Dave's example could be any of "integer", "float", "boolean", or "string", and the output of the experiment would be identical (seriously, try it), because the value of the variable is changed before it's ever used.

    As I recall, the reason why the "type" field is needed is that it was impossible to give a variable a default value of string type without it, e.g.

    <variable tag="name" default_value="Chris" type="string" ...
    

    Thinking about it now, it seems like this shouldn't be necessary, since the expression parser is fully capable of recognizing string literals (see attached example). But for some reason I opted to rely on the "type" parameter, and if I thought about it long enough I'd probably remember why.

    Anyway, regarding the specific issues at hand, I think there's a simpler solution. The two examples of unexpected/confusing behavior cited in this thread (i.e. (bool)bool_var + 2 raising an exception, and 1/5 evaluating to 0) are both the result of design decisions in the STX expression evaluator. In my opinion, neither behavior is very useful. I don't see any danger in treating boolean true and false as integer 1 and 0 (and off the top of my head, I can't think of any programming language that doesn't do that), so it seems pointless to disallow it. And I assume that most MWorks users would expect 1/5 to evaluate to 0.2; in the unlikely case that someone really wants truncating division, they can get it by casting the result of floating-point division to an integer.

    So, why not just change the expression evaluator to eliminate those behaviors? That is, allow boolean true and false in arithmetic expressions, and change the division operation to always return a floating-point result (as is the case with division in Python 3). That would resolve these issues without requiring any changes to MWorks' XML parser or the Variable and Datum classes.

    What do you think? Is there some disadvantage to this approach that I'm not seeing?

    Chris

  10. 10 Posted by Dave Cox on 09 Nov, 2012 10:35 PM

    Dave Cox's Avatar

    This would be okay by me. Integer division is nothing but trouble in 99% of cases.

    The bool thing is fine by me too. Incidentally, there are plenty of languages that don't allow booleans in arithmetic expressions (e.g. Scala), and there are principled reasons to want this behavior. However, we were just inheriting it from STX -- it wasn't a principled decision -- and I agree that most users will be most familiar with languages where true == 1 and false == 0.

    • Dave
  11. 11 Posted by Mark Histed on 09 Nov, 2012 11:28 PM

    Mark Histed's Avatar

    If you're going to implicitly promote integers to floats for division,
    there are some edge cases.

    In particular, expressions like
    3/3 == 1 may not be true (depending on the base-2 floating point
    representation of 2)

    More subtly:
    b = 3/3
    and later in your code:
    b==1 may not be true

    This basically rules out any use of logical comparisons for numeric values
    in MWorks, as you won't know in general whether numeric values are floats
    or ints at the time of comparison. Maybe you'll need to add a
    'withintol(a,b)' floating point comparison.

    Matlab does something similar - treats everything as a float. But to get
    around the comparison issue they proactively detect integer representations
    in floating point. (They call them 'flints' internally). I believe they
    special case logical operations for this.

    I'm not sure if there are other subtle issues beyond comparison that we are
    missing.

    My predisposition would be to keep both integers and float as first class
    fixed types specified in the XML initialization code, and make users deal
    with the differences between int and float math, handling casts themselves,
    with no implicit conversions at assignment time. (I think my bug was due
    to entering integers into the client window).

    If you want to implicitly cast up that's also fine with me but I'd suggest
    adding a new logical comparison operator.

    Mark

  12. 12 Posted by Mark Histed on 09 Nov, 2012 11:34 PM

    Mark Histed's Avatar

    >
    > This basically rules out any use of logical comparisons for numeric values
    > in
    > MWorks, as you won't know in general whether numeric values are floats or
    > ints at the time of comparison. Maybe you'll need to add a
    > 'withintol(a,b)' floating point comparison.

    This is a little strong. I haven't worked it through completely in my
    head; maybe most comparisons are done on variables that never are set
    through math expressions. And maybe the implicit conversion to float is no
    harder to understand than the current situation and documentation on it
    will take care of this.
    You guys should decide what you prefer.

    Mark

  13. 13 Posted by Dave Cox on 10 Nov, 2012 02:30 AM

    Dave Cox's Avatar

    Machine representations of numbers are never going to make everyone happy all of the time. Either we expect a fractional value and don't get it, or we expect a specific comparison to work and we don't get it. As Chris notes, there are examples of languages that make various choices along this spectrum, so there is no obvious consensus on the one "right" answer. The best we can do is should strive for consistency and maximal clarity.

    We can and must document whichever path we choose to achieve better clarity, but fundamentally, I think the options are:

    1) (old behavior) All numeric values are really floats. "1/2" results in "0.5", but "1/2" isn't guaranteed to be the same as "0.5" (but it often is). Best practice: all users need to know to be careful with "==". A "compare-with-tol" would be a useful tool for advanced users.

    2) (current behavior) The "type" field only applies to the default value, which is potentially confusing, but could be maybe finessed with better labeling in the editor (for those who use it) and documentation (for those who don't). Beyond that, Python 2.x rules basically apply.

    3) (my original suggestion from today) the "type" field is binding in all contexts (does not just apply to the default value). In this scenario, setting a "float" value to "1" would be the same as setting it to "1.0". You can still get in trouble by setting a "float" to "1/2", since this is an integer division (result would be "0.0"). This would basically be something like C/C++ rules.

    4) (Chris's suggestion) the "type" field could remain non-binding or be removed (see #2 for issues / ways to improve), but division would always result in floats.Integer division is no longer possible (good riddance I say), though "30/10" might not exactly evaluate to "3". This is basically Python 3.x rules.

    Are there any other options I'm missing?

    • Dave
  14. 14 Posted by Mark Histed on 11 Nov, 2012 10:38 PM

    Mark Histed's Avatar

    My main request is that it should be possible to assume whether a variable is a float or an int at any point in the XML code. So I'm in favor of your (1) or (3) below, or (4), as long as all variables are considered floats. I'd prefer to not have the assignment code guess the right type. Chris points out that type-guessing can cause problems with division and suggests making all division float division. I'd raised the point that type-guessing may also cause problems with comparisons.

    I'll agree with anything you guys decide long as it's documented.

    I see it as two decisions
    (a) is the 'type' field binding, or does variable assignment code try to guess the right type (or are all variables floats)?
    (b) Does all division result in a float, or does int division exist?

    Mark

  15. 15 Posted by Mark Histed on 11 Nov, 2012 10:55 PM

    Mark Histed's Avatar

    I just gave a quick look at several different languages' rules for type-guessing on assignment and division.
    Python3 does what you say - all division is float division but numbers are 'duck-typed'. And there hasn't been a comparison outcry; I think largely because sums, differences, and products of mixed ints and integer-valued floats can be compared to ints safely (in all languages; excepting overflows).
    If it works for Python3 it's fine with me. So I support Chris's suggestion from Friday.

    Mark

  16. 16 Posted by Dave Cox on 11 Nov, 2012 11:06 PM

    Dave Cox's Avatar

    I'm still okay with that suggestion as well. However, I think we should additionally do some improved labeling/documentation around the "type" field, or just remove it altogether.

    • Dave
  17. Christopher Stawarz closed this discussion on 25 Feb, 2013 04:17 PM.

  18. Christopher Stawarz closed this discussion on 06 Oct, 2014 04:20 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac