Representing Type Lattices Compactly

87 points by tekknolagi 4 months ago

infogulch 4 months ago

This seems similar to the way CUE type lattices work. https://cuelang.org/docs/concept/the-logic-of-cue/#the-value...

noelwelsh 4 months ago

No parametric polymorphism aka generic types?

choeger 4 months ago

There's little value for parametric polymorphism at this level as it requires parameters (not only type parameters, but also value parameters, as otherwise the type parameter are dangling).
It's better to think of these types as different. Maybe "primitive types" or "shapes".
tekknolagi 4 months ago

It's in the context of a Python JIT, where we're looking for a different kind of type information
- abeppu 4 months ago
  
  JIT internals are very much not my area so this might be an ignorant question but ... why doesn't the parametric issue come up in this context? The author starts with the example of a value which could be a List or a String, where the code calls for its `len`. But one could also need to reason about `x: list[list[str]] | list[str]` and where we'll do `y = len(x[0]) if len(x) > 0 else -1` which requires the system to be able to represent both the type of x and x[0] right?
  - deredede 4 months ago
    
    You typically don't represent that information in the type of the object itself but rather as separate information for the type of the values (ie fields). So maybe it's "a list, and also the values are lists" or maybe it's "a list, and also the first value is a string, and the other values are integers"... and maybe it's "a list, and the values are strings, and also it has a field named 'foo' that's an integer for some reason".
    You can still have the same information (depending on the actual system), just represented differently - parametric polymorphism is a bit too rigid here because not everything fits neatly into generic schemas.
    
    tekknolagi 4 months ago
    
    And another bit of info: it's much harder to reliably track that, because as soon as the list (for example) escapes, its elements could have any type and it could be any size. PyPy has storage strategies and speculative typing (with guards and deoptimization) that help with this kind of thing.
nsajko 4 months ago

Julia does this for parametric types, too.

cobertos 4 months ago

You mention integer ranges as a specific type you hope gets handled. But how does something like that work? It feels somewhere between the general type and the specialization type? Would this just be a higher level in the integer section of the specialization lattice? Or would you want it out of the lattice structure completely? Because it would kinda conflict with integer unions, when thinking about it lattice-y

(Admittedly I'm in the same boat, I don't know the math but stumbled upon implementing my own type system recently)

sparkie 4 months ago
Every _BitInt(N) is a subtype of _BitInt(N+1). For practical purposes though, you'd usually make each intN a subtype of intN*2, up to some top integer type which supports arbitrary precision.
```
    int8 <= int16 <= int32 <= int64 <= bigint
```
For this mini subtype hierarchy, you can literally just number the types in increasing order, and use `<=`, rather than bitmasks.
For inclusion of unsigned integers, the same rule applies, but each `unsigned _BitInt(N)` is also a subtype of `signed _BitInt(N+1)`. It might make sense to number the unsigned integers one higher than their signed variants of the same width to simplify testing for compatibility. A specific bit, such as 1000b could indicate a signed integer.
```
    UNSIGNED    = 0000b;
    SIGNED      = 1000b;

    enum IntType {
        UINT8   = UNSIGNED | 1,
        UINT16  = UNSIGNED | 2,
        UINT32  = UNSIGNED | 3,
        UINT64  = UNSIGNED | 4,
        UBIGINT = UNSIGNED | 7,

        INT8    = SIGNED | 0,
        INT16   = SIGNED | 1,
        INT32   = SIGNED | 2,
        INT64   = SIGNED | 3,
        BIGINT  = SIGNED | 7,
    };

    is_uint8_subtype(ty)    = (ty | UNSIGNED) <= UINT8
    is_uint16_subtype(ty)   = (ty | UNSIGNED) <= UINT16
    is_uint32_subtype(ty)   = (ty | UNSIGNED) <= UINT32
    is_uint64_subtype(ty)   = (ty | UNSIGNED) <= UINT64
    is_natural(ty)          = (ty | UNSIGNED) <= UBIGINT

    is_int8_subtype(ty)     = (ty | SIGNED) <= INT8
    is_int16_subtype(ty)    = (ty | SIGNED) <= INT16
    is_int32_subtype(ty)    = (ty | SIGNED) <= INT32
    is_int64_subtype(ty)    = (ty | SIGNED) <= INT64
    is_integer(ty)          = (ty | SIGNED) <= BIGINT
```
This is approximately how I handle it the numerical tower in my interpreter (extended with some ad-hoc rules to also support rationals, real, complex and quaternions).
- cobertos 4 months ago
  
  Interesting, I wasn't thinking of sized integers when thinking of int ranges (moreso of Python ranges like `3..7` for integers `3 | 4 | 5 | 6 | 7`) but this is fascinating as well. The interplay between different types of ranges feels like it might be complex to model.
  I wonder if a type system exists that can modify the type lattice at run-time or just before run-time, so user-defined types can be added to a system even if the program is already built. As the types usually require explicit language support but I can always foresee more explicit un-typable things that might be useful just not useful enough to put in a general type system
tekknolagi 4 months ago

Probably a different thing entirely than the specialization. A new lattice. A pair of uint64 for int ranges that has its own join function, maybe. CF Bolz-Tereick has a good blog post about this for pypy and it also talks about known bits
- cobertos 4 months ago
  
  Ah! So like lower and upper bound that join separately to include the total min and total max between the two ranges types. I see

pizlonator 4 months ago

Looks almost exactly like what JavaScriptCore calls SpeculatedType.

tekknolagi 4 months ago

Excellent, thanks for the pointer. Mentioned!

PhilipRoman 4 months ago

wow I guess great minds really do think alike, I did almost the same exact thing a few years ago, but eventually gave up as my type hierarchy grew too complex.

You could probably represent a lot more complex relations with similar strategies by adding one or two cleanup instructions to union/intersection operations, but whenever I've tried to do it, my head gets dizzy from all the possibilities. And so far I've been unable to find software that can assist in generating such functions.

sparkie 4 months ago

One possible trick for unions is to use bloom filters, providing you have some way of hashing types. If you create a bloom filter `Bloom(t1, t2)`, it's the same as doing `Bloom(t1) | Bloom(t2)`, where `|` is just bitwise-OR.
Obviously not perfect as it can produce false positives, but if we keep a filter of sufficient size, this will be low, and still be more space-efficient than keeping a bit per type in a large type hierarchy.
Intersections can also be done, but with a potentially higher false-positive rate. The result of `Bloom(t1, t2)` has at least the bits set by `Bloom(t1) & Bloom(t2)`.
- recursive 4 months ago
  
  I don't think space is going to be the first issue that causes this to fall. It will probably be the human mind's ability to conceptualize the space.

mncharity 4 months ago

OT, I'm so very looking forward to some language supporting a pushout (path independent) lattice of theories (types, operators, laws). So abstraction has math-like locality and composability and robustness.

IsTom 4 months ago

This is also how Erlang's Dialyzer works.

tekknolagi 4 months ago

Have a link to the implementation?

pjs_ 4 months ago

Come on man just do duck typing. You are killing me with this stuff.

It all reads so technical and long and mathematically academic but it’s just a bit mask.

I absolutely hate writing python now because I have to reason about a “list of list of errors” type defined by a teenager and they get mad at me if I don’t then define a new “list of list of errors, or none” type when I manipulate it. You guys are now employed by VSCode to make those hints look beautiful. VSCode is your boss now and your job is to make VSCode happy

I predict that in five years we will retvrn to duck typing in the same way that we are now retvrning to server side rendering and running on bare metal. Looking forward to the viral “you don’t need compound types” post on here. “Amazing - you can write code which does normal business tasks without learning or ever thinking about homotopy type theory”

Yes I get it if we are writing embedded code or navigation systems or graphics or whatever, please help yourself from the types bucket. Go ahead and define a dictionary of lists where one list can contain strings but all the other lists either contain 8-bit integers or None. But the academic cachet of insanely complex composable type systems bleeds through into a web server that renders a little more than “hello world” and it ruins my life

tekknolagi 4 months ago

This is not about surface level typing. This is about compiler internals.
- tekknolagi 4 months ago
  
  I added a couple of sentences in the intro to clarify.
cobertos 4 months ago

Duck typing requires the primitives of the properties to also be typed. This is a building block you must built to build up to duck typing.
I'm also unsure how you would represent a "list of list of errors" succintly without calling it just that. A list of a list of errors. An `Error[][]`. How would "duck typing" as you describe make this type somehow less complex?
__MatrixMan__ 4 months ago

I don't think there will be a movement away from type hints in python any time soon, they're too useful as guardrails for an LLM. But even without using an LLM I'm definitely faster with type hints enforced because I can just sort of feel around with autocomplete. Sure, the transition sucks, but once you're there the benefits start stacking up pretty fast.
Dr_Incelheimer 4 months ago

>filtered by Python types
lmaooo sub-2σ golem detected