You mention integer ranges as a specific type you hope gets handled. But how does something like that work? It feels somewhere between the general type and the specialization type? Would this just be a higher level in the integer section of the specialization lattice? Or would you want it out of the lattice structure completely? Because it would kinda conflict with integer unions, when thinking about it lattice-y
(Admittedly I'm in the same boat, I don't know the math but stumbled upon implementing my own type system recently)
Every _BitInt(N) is a subtype of _BitInt(N+1). For practical purposes though, you'd usually make each intN a subtype of intN*2, up to some top integer type which supports arbitrary precision.
int8 <= int16 <= int32 <= int64 <= bigint
For this mini subtype hierarchy, you can literally just number the types in increasing order, and use `<=`, rather than bitmasks.
For inclusion of unsigned integers, the same rule applies, but each `unsigned _BitInt(N)` is also a subtype of `signed _BitInt(N+1)`. It might make sense to number the unsigned integers one higher than their signed variants of the same width to simplify testing for compatibility. A specific bit, such as 1000b could indicate a signed integer.
This is approximately how I handle it the numerical tower in my interpreter (extended with some ad-hoc rules to also support rationals, real, complex and quaternions).
There's little value for parametric polymorphism at this level as it requires parameters (not only type parameters, but also value parameters, as otherwise the type parameter are dangling).
It's better to think of these types as different. Maybe "primitive types" or "shapes".
JIT internals are very much not my area so this might be an ignorant question but ... why doesn't the parametric issue come up in this context?
The author starts with the example of a value which could be a List or a String, where the code calls for its `len`. But one could also need to reason about `x: list[list[str]] | list[str]` and where we'll do `y = len(x[0]) if len(x) > 0 else -1` which requires the system to be able to represent both the type of x and x[0] right?
You typically don't represent that information in the type of the object itself but rather as separate information for the type of the values (ie fields). So maybe it's "a list, and also the values are lists" or maybe it's "a list, and also the first value is a string, and the other values are integers"... and maybe it's "a list, and the values are strings, and also it has a field named 'foo' that's an integer for some reason".
You can still have the same information (depending on the actual system), just represented differently - parametric polymorphism is a bit too rigid here because not everything fits neatly into generic schemas.
And another bit of info: it's much harder to reliably track that, because as soon as the list (for example) escapes, its elements could have any type and it could be any size. PyPy has storage strategies and speculative typing (with guards and deoptimization) that help with this kind of thing.
wow I guess great minds really do think alike, I did almost the same exact thing a few years ago, but eventually gave up as my type hierarchy grew too complex.
You could probably represent a lot more complex relations with similar strategies by adding one or two cleanup instructions to union/intersection operations, but whenever I've tried to do it, my head gets dizzy from all the possibilities. And so far I've been unable to find software that can assist in generating such functions.
One possible trick for unions is to use bloom filters, providing you have some way of hashing types. If you create a bloom filter `Bloom(t1, t2)`, it's the same as doing `Bloom(t1) | Bloom(t2)`, where `|` is just bitwise-OR.
Obviously not perfect as it can produce false positives, but if we keep a filter of sufficient size, this will be low, and still be more space-efficient than keeping a bit per type in a large type hierarchy.
Intersections can also be done, but with a potentially higher false-positive rate. The result of `Bloom(t1, t2)` has at least the bits set by `Bloom(t1) & Bloom(t2)`.
OT, I'm so very looking forward to some language supporting a pushout (path independent) lattice of theories (types, operators, laws). So abstraction has math-like locality and composability and robustness.
Come on man just do duck typing. You are killing me with this stuff.
It all reads so technical and long and mathematically academic but it’s just a bit mask.
I absolutely hate writing python now because I have to reason about a “list of list of errors” type defined by a teenager and they get mad at me if I don’t then define a new “list of list of errors, or none” type when I manipulate it. You guys are now employed by VSCode to make those hints look beautiful. VSCode is your boss now and your job is to make VSCode happy
I predict that in five years we will retvrn to duck typing in the same way that we are now retvrning to server side rendering and running on bare metal. Looking forward to the viral “you don’t need compound types” post on here. “Amazing - you can write code which does normal business tasks without learning or ever thinking about homotopy type theory”
Yes I get it if we are writing embedded code or navigation systems or graphics or whatever, please help yourself from the types bucket. Go ahead and define a dictionary of lists where one list can contain strings but all the other lists either contain 8-bit integers or None. But the academic cachet of insanely complex composable type systems bleeds through into a web server that renders a little more than “hello world” and it ruins my life
Duck typing requires the primitives of the properties to also be typed. This is a building block you must built to build up to duck typing.
I'm also unsure how you would represent a "list of list of errors" succintly without calling it just that. A list of a list of errors. An `Error[][]`. How would "duck typing" as you describe make this type somehow less complex?
I don't think there will be a movement away from type hints in python any time soon, they're too useful as guardrails for an LLM. But even without using an LLM I'm definitely faster with type hints enforced because I can just sort of feel around with autocomplete. Sure, the transition sucks, but once you're there the benefits start stacking up pretty fast.
You mention integer ranges as a specific type you hope gets handled. But how does something like that work? It feels somewhere between the general type and the specialization type? Would this just be a higher level in the integer section of the specialization lattice? Or would you want it out of the lattice structure completely? Because it would kinda conflict with integer unions, when thinking about it lattice-y
(Admittedly I'm in the same boat, I don't know the math but stumbled upon implementing my own type system recently)
Every _BitInt(N) is a subtype of _BitInt(N+1). For practical purposes though, you'd usually make each intN a subtype of intN*2, up to some top integer type which supports arbitrary precision.
For this mini subtype hierarchy, you can literally just number the types in increasing order, and use `<=`, rather than bitmasks.For inclusion of unsigned integers, the same rule applies, but each `unsigned _BitInt(N)` is also a subtype of `signed _BitInt(N+1)`. It might make sense to number the unsigned integers one higher than their signed variants of the same width to simplify testing for compatibility. A specific bit, such as 1000b could indicate a signed integer.
This is approximately how I handle it the numerical tower in my interpreter (extended with some ad-hoc rules to also support rationals, real, complex and quaternions).This seems similar to the way CUE type lattices work. https://cuelang.org/docs/concept/the-logic-of-cue/#the-value...
No parametric polymorphism aka generic types?
There's little value for parametric polymorphism at this level as it requires parameters (not only type parameters, but also value parameters, as otherwise the type parameter are dangling).
It's better to think of these types as different. Maybe "primitive types" or "shapes".
It's in the context of a Python JIT, where we're looking for a different kind of type information
JIT internals are very much not my area so this might be an ignorant question but ... why doesn't the parametric issue come up in this context? The author starts with the example of a value which could be a List or a String, where the code calls for its `len`. But one could also need to reason about `x: list[list[str]] | list[str]` and where we'll do `y = len(x[0]) if len(x) > 0 else -1` which requires the system to be able to represent both the type of x and x[0] right?
You typically don't represent that information in the type of the object itself but rather as separate information for the type of the values (ie fields). So maybe it's "a list, and also the values are lists" or maybe it's "a list, and also the first value is a string, and the other values are integers"... and maybe it's "a list, and the values are strings, and also it has a field named 'foo' that's an integer for some reason".
You can still have the same information (depending on the actual system), just represented differently - parametric polymorphism is a bit too rigid here because not everything fits neatly into generic schemas.
And another bit of info: it's much harder to reliably track that, because as soon as the list (for example) escapes, its elements could have any type and it could be any size. PyPy has storage strategies and speculative typing (with guards and deoptimization) that help with this kind of thing.
Julia does this for parametric types, too.
wow I guess great minds really do think alike, I did almost the same exact thing a few years ago, but eventually gave up as my type hierarchy grew too complex.
You could probably represent a lot more complex relations with similar strategies by adding one or two cleanup instructions to union/intersection operations, but whenever I've tried to do it, my head gets dizzy from all the possibilities. And so far I've been unable to find software that can assist in generating such functions.
One possible trick for unions is to use bloom filters, providing you have some way of hashing types. If you create a bloom filter `Bloom(t1, t2)`, it's the same as doing `Bloom(t1) | Bloom(t2)`, where `|` is just bitwise-OR.
Obviously not perfect as it can produce false positives, but if we keep a filter of sufficient size, this will be low, and still be more space-efficient than keeping a bit per type in a large type hierarchy.
Intersections can also be done, but with a potentially higher false-positive rate. The result of `Bloom(t1, t2)` has at least the bits set by `Bloom(t1) & Bloom(t2)`.
I don't think space is going to be the first issue that causes this to fall. It will probably be the human mind's ability to conceptualize the space.
OT, I'm so very looking forward to some language supporting a pushout (path independent) lattice of theories (types, operators, laws). So abstraction has math-like locality and composability and robustness.
Looks almost exactly like what JavaScriptCore calls SpeculatedType.
Excellent, thanks for the pointer. Mentioned!
This is also how Erlang's Dialyzer works.
Have a link to the implementation?
Come on man just do duck typing. You are killing me with this stuff.
It all reads so technical and long and mathematically academic but it’s just a bit mask.
I absolutely hate writing python now because I have to reason about a “list of list of errors” type defined by a teenager and they get mad at me if I don’t then define a new “list of list of errors, or none” type when I manipulate it. You guys are now employed by VSCode to make those hints look beautiful. VSCode is your boss now and your job is to make VSCode happy
I predict that in five years we will retvrn to duck typing in the same way that we are now retvrning to server side rendering and running on bare metal. Looking forward to the viral “you don’t need compound types” post on here. “Amazing - you can write code which does normal business tasks without learning or ever thinking about homotopy type theory”
Yes I get it if we are writing embedded code or navigation systems or graphics or whatever, please help yourself from the types bucket. Go ahead and define a dictionary of lists where one list can contain strings but all the other lists either contain 8-bit integers or None. But the academic cachet of insanely complex composable type systems bleeds through into a web server that renders a little more than “hello world” and it ruins my life
Duck typing requires the primitives of the properties to also be typed. This is a building block you must built to build up to duck typing.
I'm also unsure how you would represent a "list of list of errors" succintly without calling it just that. A list of a list of errors. An `Error[][]`. How would "duck typing" as you describe make this type somehow less complex?
This is not about surface level typing. This is about compiler internals.
I added a couple of sentences in the intro to clarify.
I don't think there will be a movement away from type hints in python any time soon, they're too useful as guardrails for an LLM. But even without using an LLM I'm definitely faster with type hints enforced because I can just sort of feel around with autocomplete. Sure, the transition sucks, but once you're there the benefits start stacking up pretty fast.
>filtered by Python types
lmaooo sub-2σ golem detected