Tim Sweeney Lambda-The-Ultimate Archive
Overview
A collection of Tim Sweeney's postings to Lambda the Ultimate, a programming weblog which "deals with issues directly related to programming languages, and programming language research.".
Some posts may not be formatted properly or may contain incomplete code fragments. This seems to be a defect of the Lambda the Ultimate posting system, which has partially eaten a handful of posts and on the whole produced html and xhtml that is not what one would call "well formed." Adjustments and corrections have been made where possible.
Article List
- Paul Graham: Accumulator Generator
- Feb 13, 2003 - Re: Paul Graham: Accumulator Generator
- Call-by-name is dual to call-by-value
- Mar 03, 2003 - Re: Call-by-name is dual to call-by-value
- Mar 04, 2003 - Re: Call-by-name is dual to call-by-value
- Book and Course in Constraint Programming and Reasoning
- Mar 31, 2003 - Re: Book and Course in Constraint Programming and Reasoning
- Apr 01, 2003 - Re: Book and Course in Constraint Programming and Reasoning
- J. McCarthy: Towards a Mathematical Science of Computation
- Hundred Year Language
- Apr 11, 2003 - Re: Hundred Year Language
- Apr 12, 2003 - Re: Hundred Year Language
- Selectors Make Analysis of case-lambda Too Hard
- Apr 22, 2003 - Re: Selectors Make Analysis of case-lambda Too Hard
- David McAllester's "Ontic" Language
- Apr 28, 2003 - David McAllester's "Ontic" Language
- Python Metaclass Programming
- Apr 22, 2003 - Re: Python Metaclass Programming
- Apr 23, 2003 - Re: Python Metaclass Programming
- Apr 24, 2003 - Re: Python Metaclass Programming
- Apr 25, 2003 - Re: Python Metaclass Programming
- Apr 28, 2003 - Re: Python Metaclass Programming
- Apr 29, 2003 - Re: Python Metaclass Programming
- May 02, 2003 - Re: Python Metaclass Programming
- The New C++: Trip Report, October 2002
- May 09, 2003 - Re: The New C++: Trip Report, October 2002
- Using Memory Errors to Attack a Virtual Machine
- May 19, 2003 - Re: Using Memory Errors to Attack a Virtual Machine
- Don Box on C# generics vs. C++ generics
- May 13, 2003 - Re: Don Box on C# generics vs. C++ generics
- May 19, 2003 - Re: Don Box on C# generics vs. C++ generics
- Sun, Zend push scripting for Java
- Jun 13, 2003 - Re: Sun, Zend push scripting for Java
- Point Free Style
- Jun 19, 2003 - Re: Point Free Style
- Partial Types in C#
- Jul 11, 2003 - Re: Partial Types in C#
- Theorems for free!
- Aug 01, 2003 - Re: Theorems for free!
- Udell: Symbol grounding, XML, and RDF
- Aug 11, 2003 - Re: Udell: Symbol grounding, XML, and RDF
- Closures and mutability
- Aug 18, 2003 - Re: Closures and mutability
- Aug 19, 2003 - Re: Closures and mutability
- Aug 19, 2003 - Re: Closures and mutability
- Aug 23, 2003 - Re: Closures and mutability
- Whither Self
- Sep 14, 2003 - Re: Wither Self
- CLR Exception Model
- Oct 04, 2003 - Re: CLR Exception Model
- Oct 04, 2003 - Re: CLR Exception Model
- Historic Documents
- Oct 12, 2003 - Re: Historic Documents
- Longhorn Markup Language (code-named "XAML") Overview
- Oct 28, 2003 - Re: Longhorn Markup Language (code-named
- Finalization (CLR)
- Feb 21, 2004 - Re: Finalization (CLR)
- Python 'for' as Scheme 'let'
- Feb 23, 2004 - Re: Python 'for' as Scheme 'let'
- Feb 23, 2004 - Re: Python 'for' as Scheme 'let'
- Nullable Types
- May 31, 2004 - Re: Nullable Types
- The view from the left
- Jun 12, 2004 - Re: The view from the left
- Jun 17, 2004 - Re: The view from the left
- Error handling strategies
- Aug 20, 2004 - Failure as a first-class concept
- Aug 20, 2004 - Wheat
- Implementing Declarative Parallel Bottom-Avoiding Choice
- Aug 26, 2004 - Thoughts
- Aug 26, 2004 - List Monad vs Amb
- A Deeper Look At Metafunctions
- Sep 03, 2004 - Templates = Lame Dependent Types
- Combining lazy and eager evaluation of terms
- Sep 07, 2004 - Y, etc.
- Apple Flunks First Grade Math
- Sep 10, 2004 - rational bignums as the default
- Generics for the masses
- Sep 14, 2004 - There must be a better way
- Sep 16, 2004 - Re: There must be a better way
- Amazon Associates (+ other advertising)
- Sep 16, 2004 - Good idea
- History: Array languages
- Sep 16, 2004 - Why not more array languages?
- multidimensional arrays
- Sep 17, 2004 - Arrays are functions?
- Fresh O'Caml
- Sep 22, 2004 - Generalization?
- Monads in various languages
- Oct 02, 2004 - This problem appears to exist
- Links (Wadler)
- Jan 06, 2005 - A major event
- Jan 06, 2005 - Why it matters
- Generics: The Importance of Wildcards
- May 27, 2005 - Wow
- Scrap your boilerplate with class: extensible generic functions
- May 28, 2005 - URL
- large imperative code --> functional
- Aug 16, 2005 - Display Lists
- Distributive laws for the Coinductive Solution of Recursive Equations
- Sep 13, 2005 - Difficult to read
- Sep 14, 2005 - Aha.
- Garbage Collection as a Lazy Algorithm
- Sep 19, 2005 - Relationship
- literature on commutative lifted boolean operators
- Oct 07, 2005 - Parallel Or
- The Type-System-Feature-Creep Death Spiral
- Oct 31, 2005 - partial evaluation
- Nov 01, 2005 - "Get rid of type declarations"
- Generalized Algebraic Data Types and Object-Oriented Programming
- Nov 24, 2005 - It's always great to see rece
- Anyone remember Russell (the language, that is)?
- Nov 24, 2005 - Thankfully, the full paper is
- Return of the Global Variables?
- Dec 31, 2005 - Same hack, new name
- Duck typing vs Structural Subtyping
- Dec 29, 2005 - The essence of duck typing
- Jan 02, 2006 - Extensional Polymorphism
- Naked Objects
- Jan 31, 2006 - Interesting
- Feb 01, 2006 - Introspection
- Feb 02, 2006 - The Subtyping Property
- The Next Mainstream Programming Languages
- Feb 03, 2006 - Garbage collection
- Feb 03, 2006 - Why virtual classes?
- Feb 09, 2006 - Tim's idea of adding more
- Gilad Is Right
- Feb 18, 2006 - No
- Feb 18, 2006 - Names and slots
- Virtual Machines, Language Runtimes, and the Future of Objective C
- Mar 13, 2006 - A real problem?
- Rich resource site for the programming language "K"
- Mar 13, 2006 - Yikes
- Mar 14, 2006 - There are two things here
- Folding neither Left nor Right (or how to avoid overspecifying the solution to a problem)
- Apr 30, 2006 - Neutral folds
- A note on distributed computing
- May 04, 2006 - Is this a political dissertation?
- The Weird World of Bi-Directional Programming
- May 31, 2006 - Excellent presentation
- Variance and Generalized Constraints for C# Generics
- Jun 19, 2006 - Tension
- Petition for adding garbage collection to C++.
- Jun 27, 2006 - Unreal Engine 3 garbage collection
- Jun 27, 2006 - Tim Sweeney
- Securing the .NET Programming Model
- Jun 27, 2006 - Full abstraction in multi-language runtimes
- Jun 27, 2006 - Holes
- Jun 30, 2006 - Unexpected behavior
- Jul 06, 2006 - You're right
- Socially Responsive, Environmentally Friendly Logic
- Jul 11, 2006 - Weird
- Cost of provably-correct code
- Jul 25, 2006 - Big Theorems
- Generics as a Library
- Jul 28, 2006 - Question
- Jul 29, 2006 - Expression Problem
- Aug 01, 2006 - Parametricity in Haskell
- Overloading : Why do some languages leave it out?
- Aug 03, 2006 - Overloading is hard
- Constant Values as Types
- Sep 03, 2006 - Singleton types
- Sep 03, 2006 - Right
- Small Value Set Types
- Sep 03, 2006 - Finite set types
- expressivity of "idiomatic C++"
- Sep 08, 2006 - Relying on libraries
- Why Johnny can't code
- Sep 18, 2006 - Early programming languages
- Is "post OO" just over?
- Sep 24, 2006 - AOP is only successful in
- Relationship between access modifiers and type
- Oct 03, 2006 - What are access modifiers?
- A Very Modal Model of a Modern, Major, General Type System
- Oct 10, 2006 - Quantifiers
- Quotation and evaluation -- or, how pure is the pure lambda calculus?
- Dec 27, 2006 - Lambda terms as graphs
- AgentSheets: End-User Programing (and Secret Lisp Success Story!)
- Dec 31, 2006 - Anti-Objects
- Patrick Logan on Software Transaction Memory
- Feb 09, 2007 - STM nervousness
- Feb 09, 2007 - Can you help me understand
- Feb 09, 2007 - Bloat
- Behaviour Diffing
- Feb 27, 2007 - Behavior diffing
- Syntax Solicited for Imperative-flavored Concurrent Language with Keywords
- Mar 16, 2007 - Imperatrix Ornothologica
- Twitter and Rails brouhaha
- Apr 16, 2007 - Translation
- The expression problem, Scandinavian style
- May 07, 2007 - Excellent
- "Practical" advantages of lazy evaluation
- May 30, 2007 - Advantages
- Partial evaluation applied to high speed lighting preview
- Jun 05, 2007 - Partial Evaluation in rendering
- Merging Functions, Modules, Classes, the whole nine yards...
- Jun 05, 2007 - Merging stuff
- Simply Easy! (An Implementation of a Dependently Typed Lambda Calculus)
- Jul 09, 2007 - Dependent types
- Jul 09, 2007 - Can you imagine your typical
- Jul 09, 2007 - General recursion
- Jul 12, 2007 - Type erasure requires totality
- Jul 13, 2007 - Proof-Carrying Code
- Comprehensions with "Order by" and "Group by"
- Jul 27, 2007 - Is there a better way?
- Universal Type System
- Nov 07, 2007 - Universal Type System
- Nov 08, 2007 - Incompleteness vs The Halting Problem
- Nov 10, 2007 - Turning complete type systems
- Derivation trees for lambda-calculus
- Nov 27, 2007 - Abstract syntax tree
- foundations for J, APL etc
- Nov 29, 2007 - APL and vector-like data types
- Parametric datatype-genericity
- Dec 04, 2007 - Genericity
- A Growable Language Manifesto
- Dec 07, 2007 - Nice Manifesto
- The YNot Project
- Jan 29, 2008 - Steps to Practicality
- Arc is released
- Jan 30, 2008 - ASCII Only?
- Jan 31, 2008 - The mess that is Unicode...
- Uniform naming
- Feb 18, 2008 - Dereferencing operators
- Pure, Declarative, and Constructive Arithmetic Relations
- Mar 04, 2008 - Retreating from pure Horn clauses
- Variation of C's inline conditional
- Mar 13, 2008 - Conditional
- Species: making analytic functors practical for functional programming
- Automatic Patch-Based Exploit Generation
- Apr 29, 2008 - Ouch
- On the importance of Turing completeness
- Jun 11, 2008 - Why Turing Completeness?
Posts
Paul Graham: Accumulator Generator
Feb 13, 2003 - 16:17 - Re: Paul Graham: Accumulator Generator
From the site:
Note: (a) that's number, not integer, (b) that's incremented by, not plus. <<
This is an interesting study of solutions to an ill-defined problem.
1. The term "number" is left undefined. If the problem we posed in terms of "integers" or "natural numbers", it would be clear. Are complex numbers, numbers? Are p-adics and reals, numbers? If you define "number" to include all of these different mathematical objects, then the problem is ill-typed because, for example, you can't add a p-adic to a real. The problem doesn't define what should happen in this case; should it crash? Return 7*pi?
2. You can't "increment a number". Adding 2 to 3 doesn't change the value of 3! 3 still remains 3. Numbers, as defined by mathematics sense, are immutable. One might correct this by indicating that your function should take a reference to an integer, and add a certain integer to the value referred to. But is such a function actually a function? Not in the mathematical sense. What about the case where the reference is null, if the language supports such a concept?
What the article did succeed in doing, if you read through the code examples, is highlight the non-portability (across languages) of programs depending on ill-defined concepts like "references" and "numbers". The problem as posed can't be solved minimally (with neither error cases nor behaviour beyond what is specified) in any of the languages shown!
-Tim
Call-by-name is dual to call-by-value
Mar 03, 2003 - 17:39 - Re: Call-by-name is dual to call-by-value
Related to non-deterministic extensions of the lambda calculus, have you seen McAllester's Ontic?
http://ttic.uchicago.edu/~dmcallester/ontic-spec.ps
Here the 'amb' operator is used both to express potential values of terms, and to express types, interesting because it enables one to express types and values in the same term language.
I've been implementing a language using this approach and it seems very fruitful. Amb-style nondeterminism needn't be incompatible with call-by-future, as long as the reduction system is aware of the sharing semantics.
-Tim
Mar 04, 2003 - 17:55 - Re: Call-by-name is dual to call-by-value
Paul,
When I get some time I'll try to submit something on UnrealScript. Here's the quick summary.
Plusses: - Using the Java principles as a starting point has worked well (no global variables and functions, a distinction between object types and simple types, etc). - The game-oriented extensions (state scoping, simulated multithreading with latent functions) have significantly improved the process of writing gameplay code. - Shipping the editor & compiler with the game have helped mod authoring significantly. Lots of people actually learned to program with UnrealScript, which is neat. - The script metadata that's exposed to C++ greatly simplified the implementation of the user interface, networking code, import/export code, and many other things.
Minuses: - Performance is a problem sometimes (10-20X slower than C++), but the language structure is not very amenable to writing a simple assembly code generator. - The widespread sharing of object data between UnrealScript and C++ has caused complications in keeping the definitions synchronized and makes porting harder. - "Pause between level changes and garbage collect everything" doesn't scale.
You can't really write a BNF grammar for UnrealScript, since the parser depends on the metadata generated from previously parsed code for disambiguiation - similar to C++, but in retrospect a bad idea. If you'd like to see the compiler source though I'd be glad to email it. Besides shedding light on the language grammar it also serves as an example of how to build an 8000-line single-pass recursive descent lexer and parser for a modern language, and why this is not the best way to do things. :-)
Book and Course in Constraint Programming and Reasoning
Mar 31, 2003 - 18:37 - Re: Book and Course in Constraint Programming and Reasoning
These languages seem very attractive because they make it look like you can specify a problem very concisely, and the language will automatically work out the solution for you. Several comments:
- This doesn't really work in practice. "Generate and test" is hopelessly exponential for problems of interesting size.
- However, constraints can be very useful as a feature in a type system in a more traditional language. Being able to express things like "the type of natural numbers less than 31" and have the compiler perform range arithmetic as part of type checking as very useful. This approach is the first step towards implementing real mathematics in type systems. By "real mathematics", I mean mathematics where natural numbers are a genuine subtype of integers, which are a subtype of rationals, and when you perform common operations on numbers of certain types, the results are closed as you'd expect (i.e. integer*integer=integer), without mathematically unsound casts and coercions.
- I dislike the lack of structure in the example languages described; most of the research is being done in Prolog-style "everything is global". Because of this, there is a huge gap between these R&D languages and any future implementation that might appeal to mainstream programmers. It's very interesting to explore these concepts in structured languages that include things like dependent-typed records.
- It even more interesting to apply CLP style constructs to a language that unifies types, sets, and nondeterminism into a single framework, such as McAllester's "Ontic" language. Here, constraints just work automatically as part of the type system. Also, you get first-class proof:proposition :: value:type, even when the constraints are intractable. For example, you can express the type of pythagorean triples as:
t typedef {x:nat,y:nat,z:nat=sqrt(x*x+y*y)};
And whether or not the language is able to tell you anything useful about the type t, such as whether it's inhabited, and one or more solutions, you can always say:
x:t={3,4,5};
And the language accepts it. Verifying that a deterministic element satisfies a (probably non-deterministic) set of constraints is a much easier problem than trying to solve the constraints in general, so it's reasonable to support logic constraints in a language even when the general case of typechecking/solving them is incomplete or undecidable.
Apr 01, 2003 - 14:51 - Re: Book and Course in Constraint Programming and Reasoning
Dominic,
In the style of code I was writing, there are three cases:
- Some constraints can be verified at compile time. These simplify down to determinate terms.
- Some constraints can only be verified at runtime, so the compiler just marks them nondeterministic (i.e. the type may be inhabited by 0, 1, or many terms), and lets the runtime sort them out.
- Some constraints can't be verified due to limitations of the language or to undecidability in general. These result in a compiler error. For example, you can express Fermat's Last Theorem as "flt typedef {x:nat>0, y:nat>0, z:nat>0, n:nat>2, x^n+y^n=z^n }". The compiler has no idea whether that statement is inhabited or not, but it's happy to accept the definition. And if you supplied a deterministic value x, the compiler or runtime can tell you whether x is a witness to flt.
I have a compiler that accepts these kinds of expressions and can verify constraints. For example with the above definitions, you can say:
x:t={3,4,r:nat}
And the term reduces to {3,4,5}. This is just a simple case of filling in the blanks and forward-propagating values around terms. This isn't an everything-is-global implementation like Prolog; everything can be arbitrarily nested in records, closures, arrays, etc.
The "=" operator performs unification. In the case of deterministic terms (the only thing you can express in languages like C++), it works just like you'd expect, but given non-deterministic terms a and b, "a=b" denotes the set of values that a and b have in common.
You can't express this sort of thing in C++, because there's a syntactic distinction between templates (which are purely compile-time), and functions (which are a purely runtime). So, for example, you can't write a simple function f(x) and template t<i> and refer to t<f(x)> for example. You can't write a single "assert" that functions both at compile time and runtime. You can't write a matrix(n,m) class which supports type-checked fixed-size matrices and unknown (at compile time) sized matrices.
So every C++ class library tends to implement stuff twice, once for the purely runtime case (i.e. for variable-size matrices) and again using templates for the purely compile-time case (i.e. for fixed-size matrices).
These limitations are understandable for a language that evolved from C, a "processor independent assembly language" from the 1970's. But it's very unfortunate that modern efforts like the Java generics extensions, and the C# generic types proposal, still go that route. Dependent types provide a natural and far more general solution than templates, allowing for checking at compile-time when possible, and at runtime otherwise.
-Tim
J. McCarthy: Towards a Mathematical Science of Computation
Apr 04, 2003 - 16:10 - Re: J. McCarthy: Towards a Mathematical Science of Computation
The problem with proving programs correct is that, in all systems produced to date, the proofs are far larger and more complicated than the programs they're proving correct. Even the proposition (the type of the proof) tends to be more complex than the actual program, for example the proposition that qsort returns its argument stably sorted, compared to the implementation of qsort.
The pattern I've seen informally looking at the research are factors like 1:2:40. For each line of program code, you need 2 lines of specification code, and 40 lines of proof code.
The end result is that it's easier to eyeball your program and guess that it's probably correct, than to eyeball the proposition being proved and guess that it's probably proving the right thing. So, as far as shipping quality code is concerned, you get 40X better cost/benefit from writing code than from trying to prove stuff about it.
Proof techniques would need to come a very, very long way before this situation changes.
Hundred Year Language
Apr 11, 2003 - 17:46 - Re: Hundred Year Language
This is a great article, the first such set of predictions I've seen that are grounded in reality and history, rather than silly speculation.
The history of mathematics provides a good roadmap, both for the areas where we can expect major progress, and those where we should only hope for incremental improvement.
When I was a kid, I had an old Calculus book from the early 1900's and learned calculus from it, because that's the only reference I had on the topic. When the subject was later taught in school, it was a surprisingly natural transition -- this despite the 100 years of scientific progress that had passed since the book was written.
Language syntax and notation can only be expected to change incrementally. 100 years ago, mathematicians used f(x) to denote "calling function f with parameter x", and they will likely still be doing so 100 years from now. This is almost certain to carry over to future programming, too. The current notations for function calling, infix operators, and definitions won't likely change much. I expect that if in 100 years I showed a programmer "f(x:int):int -> if x<=0 then 1 else x*f(x-1)", he would understand exactly what this code is doing.
The most important change in mathematics over the past 100 years has been in putting it on a solid logical foundation, such as basing calculus on formal limits rather than ill-defined infinitesimals.
I think the most important change in programming in the next 100 years -- and which will happen much sooner than 100 years from now -- is the reformulation of language semantics around well-defined type systems based on mathematical logic. The last 30 years have led to incredible progress in research here, but the results haven't been picked up by mainstream languages, though R&D languages like Haskell based on sound type systems are gaining ground among certain groups.
C# is a great illustration of the superficiality of popular language progress. It is the most polished and fine-tuned mainstream language ever created, yet fails to come to grips with basic mathematics type and theory. You still have absurdities like 2^64=0, 7/2=3, simple expressions like x/y and a[x] which the compiler accepts yet aren't actually typesafe in any reasonable definition of the term, arrays have serious typing flaws, and the basic confusion between things and references to things remains.
Right now, the open language lineages are:
- LISP: This is the apex of untyped programming with nested scoping and metadata manipulation. It's at the end of the line, not because of lack of potential, but it's so simple and powerful than any improvements to it can be implemented in LISP itself without affecting the core language. In 100 years, after the last C and C++ programmers are long gone, there will still be LISP enthusiasts. But don't expect large-scale software development to happen this way.
- C, C++, Java, C#: The problem nowadays isn't with implementation but with the underlying flaws in the paradigm. We can continue to go the route of adding tweaks as with C#, but they're asymptotically approaching a limit that's not far from the current state of affairs.
- Python, Perl, etc: Languages with significant improvements in usability for certain audiences, and are generally a mix of untyped LISP features with typed C/Java-style features. The primary innovations I see here are in the syntax and libraries, for example Python's pioneering of "pseudocode that runs" and Perl's first-class regular expressions and text parsing capabilities. Future languages will certainly inheret these features, even as the underlying paradigm is changed.
- ML, Haskell, simply-typed functional languages with type deduction and, in general, programs consisting of a whole bunch of top-level declarations. The elegance of these languages is based on Hindley-Milner and with most of the neded extensions of the language (subtyping, dependent types), that breaks, and the resulting declarations and error messages become quite a mess. These languages will certainly remain useful to their current fans, but I see no hope for this style of programming to scale up to mainstream development.
So which lineage will be the basis of languages 100 years from now? I think the syntax has long since been determined, and will fall very much in the middle ground between Haskell, Pascal, and C++ -- and not very much outside that triangle.
Regarding the type system and semantics, I think the basic principles in such mathematical-style code as "f(x:int):int -> if x<=0 then 1 else x*f(x-1)" will not change one iota in 100 years, but that the current notions of objects, references, member functions, exceptions, however-many-bit integers, integer being "the" type of 7, and so on will seem terribly archaic. In this area, I don't think any current lineages will stand.
However, I do think we'll converge on a type system that is not far outside of the past 20 years' research into dependent types, intersection and union types, unification, constraint logic, and proofs-as-programs. These type systems essentially have the expressive power of mathematical logic itself, which is fundamental and not a product of invention or whim.
Apr 12, 2003 - 08:41 - Re: Hundred Year Language
Actually, as Hilbert discovered thanks to Goedel, mathematics hasn't got and will never have a solid logical foundation
Please take everything I say here to be mod undecidability, the halting problem, and finite resource limitations on actual computers.
Why is 2^64=0 absurd? This is just exponentiation modulo 2^64.
While the C definitions of (+,-,*) are consistent with the ring of integers mod 2^64, (/,<,<=,>,>=) aren't. C math is inherently non-mathematical.
This was fine for the "platform-independent assembler" that C was in the 1970's, but in the future I think programming language math will map more closely to actual math.
Why is division not typesafe?
int divide(int num,int denom):
In C, this is a useful function, but it's not division. It crashes in the case of a zero denominator, and returns mathematically incorrect results when there the numerator is not an integer multiple of the denominator.
That Ackermann(64,64) is an observationally partial function is an artefact programmers much accept if we want the power of being able to write general recursive functions.
But a division function that crashes and sometimes returns invalid results, isn't going to be with us much longer.
rational divide(rational num,rational<>0 denom):
This is a typesafe and mathematically correct definition of division. Type systems in current languages don't support this, but there is a great deal of promising type system research to draw from for the future.
C# arrays
For example, you can pass an array of integers to a function expecting an array of objects, and try to write a character into it. This is the classic problem of not properly distinguishing a thing and a mutable reference to a thing. A thing is constant and referentially transparent and obeys very useful subtyping and variance rules, for example allowing you to pass an array of integers to a function expecting an array of objects. A reference to a thing isn't like that at all.
Dependent types
How do you feel these work against separation of concerns?
This answer might partially dependent on what one considers dependent types. C++ templates (yuck) and C# generics could be considered dependent types, with a syntax limiting them to situations that allow decidable typechecking. Haskell, if you get into the implicit universal quantifiers involved in type deduction, has a constrained form of dependent types.
One view on dependent typing is that C++/C# generics are just a limited form of expressing functions that take type parameters and return types or functions as results. There's a direct correspondence between something like:
template<type t> class c {int a; t b;}
And a dependent-typed defintion like:
c(t:type):class={a:int,b:t};
But with dependent types, you have much more power to mix types and values. And, typechecking becomes undecidable in the general case, though the subset corresponding to any existing type deduction / template / generic type frameworks is certainly decidable, so it's at least as expressive as past work.
I don't see undecidable typechecking as a big problem. Current compilers either succeed or fail with a definite error. Undecidability just leads to another category of compiler failure, where the compiler can't accept the program, not because it's certain to be incorrect, but because it's not certain to be correct.
unification of functional, logic, OO
I agree 100% about the upcoming convergence of these things and find the topic very exciting.
I think, actually, you'll get constraint logic for free as a result here. Yes, it will be undecidable in many situations, but a practical tool regardless; C++ template typechecking is undecidable, for example. Typecheckers eventually become theorem provers in this case, though probably not very powerful ones.
Selectors Make Analysis of case-lambda Too Hard
Apr 22, 2003 - 17:55 - Re: Selectors Make Analysis of case-lambda Too Hard
In a type system with intersection types and contravariant subtyping of function argument types, you get case-lambda style functionality automatically. In that setting, you have results like:
add0=lambda().0 add1=lambda(x).x add2=lambda(x,y).x+y
intersection(add0,add1,add2)=a function that takes 0, 1, or 2 arguments and adds them.
However, I've found implementing intersection types and full contravariance very tricky, so this isn't an economical solution if all you want is case-lambda.
David McAllester's "Ontic" Language
Apr 28, 2003 - 19:20 - David McAllester's "Ontic" Language
http://ttic.uchicago.edu/~dmcallester/ontic-spec.ps
Ontic is a higher order formal specification language designed for expressing formal concepts as clearly and concisely as possible. The consiceness and clarity is achieved through the use of nondeterminism. In a nondeterministic language each term has a set of possible values. This improves conciseness in a variety of ways. First, since terms denotes sets, nondeterminism allows terms to be used as types. Second, since each term has a set of possible values it is natural for a term not to have any values.
Since reading this paper last year, I've come to believe that McAllester's work is one of the major breakthroughs of type theory, indicative of type systems in future programming languages.
There's something fundamentally new happening here that hasn't been seen in strongly-typed programming languages: a general means of combining types and values into a single Turing-complete framework that gains the expressive power of functional, logic, and constraint logic programming.
To appreciate Ontic, you need to accept three concepts that don't have analogies in current languages: An "either" operator for expressing terms with multiple potential values, where "potential" has a very specific meaning that takes sharing into account and requires looking at beta-reduction more carefully; a type-forming operator, "the-set-of-all"; and a potential value extraction operator "an-element-of".
The value of this paper may be obscured by its LISP-based presentation and by its "nondeterminism" terminology; perhaps this is why almost nobody has noticed it since its 1993 publication. In actuality, Ontic is fully deterministic, and none of its fundamental features are specific to LISP. But if you're into type systems, please bear with me and give it a read.
Python Metaclass Programming
Apr 22, 2003 - 24:53 - Re: Python Metaclass Programming
Type theorists put the question, "What is the type of a type?", on a solid theoretical footing years ago and provided many useful models for answering such questions. Metaclasses aren't one of these answers. The superstition that metaclasses have something to do with types of types does the world more harm than good.
Apr 23, 2003 - 17:07 - Re: Python Metaclass Programming
Well, I made similar design mistakes in past languages I shipped. In UnrealScript (http://unreal.epicgames.com/unrealscript.htm), as in Java, there is a "Class" class which stores metadata about classes; its class is also "Class", and its superclass is "Object".
I didn't know much about type theory then. Now that I do, I recognize this model as unnatural, and think we can get MUCH better end results by exposing a real type-theoretic notion of types, and that this would only make the language more powerful and practical for everyday programming.
Here's how I look at types now:
- A type describes a set of values. The set may be empty, finite, or infinite. In this interpretation, the "int" type is shorthand for "the type containing ..,-1,0,1,2,..".
- A type doesn't "contain" any more information than the set of values it describes. It's not like a record or struct type that contains member functions, member variables ("the type's name"), etc.
- Two types are equal if they describe the same set of values, unequal otherwise (full extensionality). However, comparing two types need not be a decidable operation.
- The operations allowed on types depend on your language:
* In the most basic example, simply typed lambda calculus, there are no operations on types. The only way that types are used is to limit the values which the compiler accepts in a certain place (for example, in a function application for a given function). This information is statically checked at compile-time, and erased by runtime, therefore no aspects of types are visible at runtime.
* Some languages allow dynamic casts, the ability to see whether an arbitrary value, known precisely only at runtime, belogs to a certain type. This needn't imply dynamic typing; there are constructs like "ifcast(x,value,type) then a{x} else b" which maintain static type safety in the presence of casts. Here types, can't be completely erased, so some aspects of the type systems are exposed to the runtime.
* More advanced languages support dependent types, intersections of types, non-disjoint unions of types, testing types for inhabitance/emptyness dynamically, etc.
- It's valid to talk about subtyping in full generality. For example, there is (at least conceptually) "the type of all integers except -3 and 7", and it's a subtype of the type of integers.
- For general purpose programming, you tend to want an empty type containing no values - you might call this "uninhabited" - note there can be only one by extensionality, and a universal type (conceptually) containing every value that is valid in the language - you might call this "object".
- For any type t, there is such a thing as "the type containing all the subtypes of t, and nothing else" (similar to the powerset from set theory). If you have a universal type "object", then the type of all types is precisely the type of all types that are subtypes of object. You can iterate this process to obtain "the type of all types of types" and the result is valid and non-circular. Each iteration you're narrowing the set of values you're talking about, and each time you're just dealing with a subtype of object and nothing fancier than that.
- For sufficiently advanced type theories, we need to accept undecidable typechecking (i.e. for dependent typing in the presence of recursion, the type system is Turing-complete). This doesn't mean that we have to accept unsafe typing, it means the compiler may have to reject some programs that are actually safe, and users may have to work around these limitations.
- It's valid to talk about functions that take types as parameters, and return types as results. These are considered functors in stratified type systems, but if you see types as just a certain kind of value, then you only have functions, and not any special "forall" quantifiers or type-level lambdas.
- This level of generality isn't unreasonable for future languages. For example, see McAllester's Ontic for a framework that combines most of these concepts.
- Practical programmers want metadata, reflective capabilities, the ability to mix and match open-world and closed-world programming models to fit their needs, a combination of nominal and structural typing capabilities, and a mechanism for reusing implementations (i.e. inheritance or encapsulation). These features can all be supported in a sound type framework, but we mustn't confuse metadata and types. Metadata is something that you want certain record-like data structures to contain, so that you can use reflective programming techniques.
Here's how I'd redesign a modern language like Java in this light:
- Java uses "object" to denote "the type of things that contain fields". Change this name to "record", a more specific term that's generally recognized as meaning "something that contains fields".
- Java uses "int" to denote "the type of integers", and "Int" to denote "the type of integers that have been boxed up into a record". In Java, "int" is not a subtype of "object", but "Int" is. Ok, we blew it here. Keep "int" as-is and eliminate "Int". We won't need "Int".
- Add a new type "object" which, this time around, really means "the type of all objects", not just "the type of all records". Now EVERY type is a subtype of object. "int" is a subtype of object, "char", etc.
- "object" doesn't have any member functions or other fields (remember, it's not a record type now). Given an object, you can't do anything with it, except try to cast it to other types.
- Expose functions as first-class language values, with proper variance rules for their parameters and return types. Add a new base type "function", the type of all functions. It's a subtype of objects. Now you can pass around functions freely.
- Eliminate any thought of this C# "automatic boxing" concept. Now we see this is a just a flawed workaround to a previous language flaw: what it's really doing is encapsulating ints (which are now objects) as records, which was only necessary then because we didn't have a real type theoretic "object" type previously, so if we wanted to pass stuff around polymorphically, we had to stick it in a record.
- Add a new "type" type, to mean "the type of all types". "int" is a value of type "type", as is "record", "string", and so on. Eliminate "class".
- Then, add a metadata framework to allow optionally attaching reflective metadata to records. This could look a whole lot like the current Java class / reflection approach, but we would stop pretending that classes had something to do with types, and recognize them as optional metadata tacked onto records.
Apr 24, 2003 - 11:03 - Re: Python Metaclass Programming
Frank:
Every `value' belongs to exactly one type, which never changes; talking about run-time types is meaningless.
Doesn't this view deny the existance of subtypes? For example, what one type does the value 7 belong to? The type of rational numbers, the type of integers, the type of natural numbers, the universal type?
In a type system supporting full powerset style subtyping, 7 belongs to, and only to, every type that is a supertype of the type containing just 7. Full subtyping can be useful in practical programming. For example, if you have arrays of fixed length, it's useful to have "the type of natural numbers less than n" for every n, so that you can index into arrays with the type system guaranteeing indices are in range (rather than runtime checks that throw exceptions.)
Since we never care about the underlying representation, the values
This implies an absolute separation between types and values, which is the norm in most popular type systems. But this separation limits the expressiveness of the type system, and I want to go beyond this and be able to intermix values and types freely, for example in dependent types, type-forming operations, and value extraction operations. This can be done soundly, but requires breaking down the "don't care about underlying values" barrier.
Concretely: I want to be able to write functions that take any mix of types and values as parameters, and return any mix of types and values as results. I want to cartesion-dependent-product up types and values in arbitrary ways. If you have a natural number i, be it statically known or only dynamically known at runtime, I want to be able to form the type of natural numbers less than i, and use it dynamically, for example in runtime casts, and I want the compiler to verify that everything I do is decidable and typesafe.
we never talk about type equality, except trivial equality, but rather type isomorphism: types are isomorphic iff they behave the same way.
This is certainly the mainstream view. I prefer to see extensional type equality exist and be exposed as a first-class operation, though it's undecidable in the general case. (If a compiler is going to accept undecidability in some aspects of typechecking, why not go all the way?)
Consider finite types, for example. Do you consider "the type containing just 3 and 7" to be isomorphic to "the type containing just 'x' and 'z'"? It's easy to come up with a bijection between them, as it is between many conceptually different types (integers and character strings, for example). I don't think type isomorphism gives you much useful information from a type system perspective.
On the other hand, knowing whether two types are extensionally equal (or in a subtype relationship) gives you useful information about which values are allowed in which contexts.
Andrew:
There are logical pitfalls with notions from naieve set theory, such as a universal type ("the type of all values allowed in the language"), a type-of-all-types (itself a type), and Russell's paradox ("T is the type containing all types that don't contain themselves. Does T contain itself?").
There are various ways of resolving these problems:
- The simplest is to disallow such constructs entirely, which is sound, but reduces the power of the type system. For practical programming, it's really useful to have a universal type; for dependent-typed programming a "type of all types, itself a type" is very useful. So foregoing these constructs has a cost.
- These is a little bit of work on non-wellfounded set theory and related type theories that eliminate the paradoxes by changing the underlying axiom systems. Applying such systems to programming languages would be quite speculative.
- Type systems like the one in McAllester's Ontic paper achieve a middle ground by exposing tiered types that are sound and almost as useful for practical purposes, but have some limitations.
- Finally, if you've accepted the presence of undecidability in your type checker, you can expose true universal types, while masking the paradoxes behind the vale of undecidability. For example, you can form a "type containing all types that don't contain themselves", and verify that simple types belong to it and that the universal type doesn't belong to it, but it's impossible to decide or prove (within the type system's axioms) whether it belongs to itself. Not much formal work has been done in this area, but I think it can be made sound, though using such constructs in practice feels quite subversive.
Apr 25, 2003 - 16:29 - Re: Python Metaclass Programming
Having these discussions reinforces how much one's choice of type theory is a matter of opinion and choice rather than "right or wrong".
Frank:
Ok, let's see if I understand what you're advocating. In your type system, given an integer x, I should be able to compare it to the integer constant 7, similarly if both are rational numbers. But if I have an integer x and a rational number y, your compiler would say something like "You can't compare integers and rational numbers, because they're not the same type"?
So, given an integer x, to convert it to a rational number, you'd need to call some function (perhaps in the same module number where the abstract "natural number" type where declared) to convert it.
I'm coming from the opposite perspective, where I want type relationships (like subtyping) to directly correspond to what people expect in the given domain. For example, in mathematics, the set inclusion hierarchy between natural numbers, integers, rationals, imaginaries, complex numbers, etc. I want the type system to directly reflect this, so you can pass integers to functions that expect rationals directly, without any conversions.
This approach needn't prevent one from using abstract types. It's actually sound to mix structural subtyping and nominal subtyping in very interesting ways (though, as far as I've found, this is quite difficult and not very well studied).
What I mean concretely is that you can hide the implementation of "rational numbers" in a module that exposes certain operations (comparison, arithmetic, etc), exposes the constrant that they are a supertype of the integers, yet still hiding their representation type. So, given such abstract defintions of integers and rationals, you can freely pass integers to functions expecting rationals, but not vice-versa. However, even if rational numbers are implemented as (for example) a pair of integers in some cannonical form (in lowest terms, etc), this fact is hidden outside the module, so that you can't (for example) compare a rational number to a naked pair of integers.
With this approach, I think you get the best of both worlds: You can hide the implementations of things, yet expose the subtyping relationships that users would expect in the problem domain.
The only drawback I've found to this approach so far is that naked (non-abstract) values in this type system can be compared in ways that programmers don't expect in other languages. For example, you can write code like "fac(x:int):int=>if x='0' then 1 else x*fac(x-1)" where you're comparing x to the character '0' instead of the number 0, and this is a valid comparison which always evaluates to false.
To be sure this isn't taken out of context, I'm only proposing this approach from a high-level language point of view. The programmer would see integers as a subtype of rationals. When the compiler and linker get down to actually translating such code into efficient runnable form, things are implemented quite traditionally, with inlining, boxing, and conversions/coercions to (for example) extend integers into equivalant rational numbers, and so on.
Regarding the array subtyping issue you mention:
I've found practical advantages to exposing both contravariant functions and invariant functions. Invariant functions behave like arrays from a typing perspective: they can only be equal if their domains are equal. Contravariance gives you record-style subtyping and the ability to extend a function "+" from integers to rational numbers while maintaining compatibility (so, for example, you can have "the ring of integers" be a subtype of "the field of rationals" -- great, but contravariance has two big disadvantages:
1. In a type system that admits some comparisons to be undecidable and some decidable, any values of record or array type built from contravariant functions much necessarily be incomparable, because the domain of any value not known precisely at compile time may be extended in a way that allows returning incomparable values. So if you build arrays out of contravariant functions, it's impossible to compare them.
2. From a user's perspective, having {1,2,"x",9.5} <: {1,2} is not something one expects of arrays.
Invariant functions allow you to write arrays which are guaranteed to be comparable given comparable range and comparable finite domain, and make {1,2,"x",9.6} distinct from {1,2}.
Taking this further, you can then have invariant functions be subtypes of identically declared contravariant functions, and think of contravariant functions (conceptually) as an infinite union of the identical invariant function with every possible extension of its domain.
Apr 28, 2003 - 18:37 - Re: Python Metaclass Programming
Kimberley:
One researchy attempt along these lines is McAllester's Ontic (http://ttic.uchicago.edu/~dmcallester/ontic-spec.ps).
Unfortunately I've never seen any compilers that apply these concepts to real-world programming languages, but I'm trying to implement one. My experience so far is that it's achievable, but is difficult to nail down all of the proper typechecking rules and to model real-world programming constructs (like inheritance and open-world programming) from such low-level primitives.
Re counting sheep:
Real-world programming is more about sheep than isomorphisms of sheep. Are two sheep the same sheep? Given two sets of sheep, are the sets the same? Given a set of sheep and a sheep, does the sheep belong to the set of sheep? What elements belong to the set of all subsets of the set of all sheep? These are useful concepts for typechecking.
Hiding equality and forcing programmers to deal with explicit isomorphisms is a strictly less useful technique than directly exposing subtyping and type equality.
First, because compilers are capable of pretty impressive verification of subtyping properties, even for really powerful languages (where, though the general problem is undecidable, there is a very expressive decidable subset of the language). Verifying anything related to isomorphisms is far harder. Have you tried doing this for serious programming languages? In a practical implementation, I think you end up with progammers writing a ton of explicit coercions which are claimed to be isomorphisms but in practice are just a bunch of arbitrary functions the programmer hopes are correct isomorphisms.
Second, because most of practical programming practice today requires subtyping. Any serious app written in C++, Java, or C# would become way more verbose if subsumption were explicit, with explicit coercions all over the place.
I mention that my philosophy on this topic is very much a result of studying McAllester's papers in the past year. In (mostly minor) language work I did in the 15 years previous to that, my thinking was very similar to yours, that in the ideal type system, types are an utterly distinct from values, that concepts like type equality don't necessarily follow from value equality, etc.
Since then, I've looked at the basic constructs in McAllester's paper as the basic building blocks of future type systems: a type-forming "the-set-of-all" construct, an element extraction construct "an-element-of", types as values, types-of-types available in a structured way ("a-subset-of"), etc. Sadly the terminology and programming style there tends to obscure the fundamental breakthroughs there.
Re array subtyping:
You had pointed the unnaturalness of "{0..3} < {0..4}" leading to the conclusion that "chararray(4) < chararray(3)". You used it as an example of the arbitrarity of subtyping. An alternative view is that subtyping is fine, but there are really two different kinds of function-like things, one where a->b <: c->d iff b<:d and c<:a, and the other (call it ~>) where a~>b <: c~>d iff b<:d and c=a.
Apr 29, 2003 - 13:35 - Re: Python Metaclass Programming
Frank,
This has been a very useful discussion for my understanding of type system decisions. I'm not making these arguments because I'm confident you're wrong, but because I'm worried that I'm wrong.
I've definitely encountered the broad outlines of the problems you mention, of naked equality (where 3='x' is a valid comparison, a useful degree of typechecking is lost), and of dependent types often being an unnecessarily complex solution to certain problems.
Just a few quick notes:
So you are saying that you want an array of b's indexed by a to be a "function-like thing" of type a~>b?
Yes.
Then you would get no subtyping relationship between chararray(3) and chararray(4) at all.
Right; this one point is certainly sound and not speculative. It turns out this is what you want for arrays, because given an array a~>b where a is a bounded natural number type, one of the fundamental operations you want need is length-extraction. If the domain is invariant, then you can decidably extract the length of any array value.
With this invariant domain approach, you can precisely specify the type of a 7-character array as "(x:nat<7)->char", or an unknown-length array of characters precisely as (exists y:nat).(x:nat<y)->char.
If the domain where contravariant, domain extraction would be more subtle (because the domain could be expanded to be anything), and array comparison would be undecidable.
Continuing this exercise, we have -> for modelling contravariant-domain functions, and ~> for modelling invariant-domain arrays. What is the covariant-domain version of this? From the subtyping rule, it's the dependent-pair type constructor!
How is that an argument for subtyping? Because it eliminates consideration of the "unnaturalness" I mentioned?
This doesn't do much to sell subtyping, it just counters the (true) argument that "modelling arrays as contravariant-domain functions leads to unnatural results" by providing a natural, subtyping-friendly way of modelling arrays.
Of course, as you point out, the big argument against subtyping isn't this, but that subtyping can be regarded as an arbitrary imposition, especially when viewed from a category theory point of view (as opposed to a types-as-sets-of-values point of view).
One final note with regard to subtyping: in a framework like Ontic, it's not really possible to avoid subtyping, because types don't exist as a separate concept, but only as "quoted" (the-set-of-all) nondeterministic collections of potential values. From a categorical/iso point of view, I suppose this is heresy, but it seems the most powerful and natural way of modelling types if one takes a types-as-sets-of-values point of view.
I will definitely try to rekindle this argument in the future when I can present something much more concrete.
May 02, 2003 - 17:50 - Re: Python Metaclass Programming
Still, your representation of arrays seems rather complex. If it were me, working in your system, I would start with the obvious fact that an array of b's of length n should be representable as an n-fold product of b', where b' is isomorphic to b.
In my experimental compiler and type system, I have arrays and n-fold products not only being isomorphic but identical: they are extensional, invariant-domain functions from natural numbers less than some bound to the product's type.
In the case of uniformly typed products, these are simple functions, and in the case of arbitrary products (say int*string*char), they are dependent-typed functions.
The theory of the all this is debatable, but it does work nicely from a practical programming point of view. Here's a quick view of this in my actual syntax (a blend of C constructs with Pascal-style declarations, in other words very non-Haskell/ML style).
The type of functions from integers to integers.
(:int):int
A function from integers to integers.
(x:int)x+3
Same as above, but with an optional type declaration.
(x:int):int=>x+3
The type of functions from natural numbers in the range 0..3 to integers.
(x:nat<4):int
An actual function like the above.
(x:nat<4):int=>x+7
Recursive function.
f(x:int):int=>if x=0 then 1 else x*f(x-1)
The type of arrays of 4 integers. Arrays are like functions, but the domains subtype invariantly, and they evaluated and typed extensionally.
[:nat<4]:int
Alternate syntax for the same thing.
{:int,:int,:int,:int}
You can give the elements of the array names.
{x:int,y:int,z:int,w:int}
A concrete array of 4 integers.
{3,6,9,12}
Equivalant way of expressing the above. Like lambdas, but for arrays.
[x:nat<4]x*3
A non-uniform array type.
{x:int,y:string}
Equivalant way of expressing that. Note type-dependence.
[i:nat<2]if i=0 then :int else :string
Combining some of the above: A function from 2-element arrays of integers to integers.
(x:int,y:int):int=>(x-1)*(y-1)
Arrays can have recursive value dependence. See Hickey's "Very Dependent Types" paper for elaboration. For example, {4,2} is an element of this type and {5,6} is not.
{x:nat,y:nat<x}
Equivalant to the above.
f{:nat,:nat<f[0]}
Equivalant to the above.
f[i:nat<2]if i=0 then :nat else :nat<f[0]
The type of all pythagorean triples.
{x:nat,y:nat,z:nat=sqrt(x*x+y*y)}
My thinking here is that, without diverging too much from mainstream applications programming language style, you can gain remarkably more statically typecheckable expressiveness by unifying some of the constructs. Java/C# notions of functions, arrays, structs, and open-world objects are all represented as different sorts of functions differing in domain variance, extensionality, and dependence.
The New C++: Trip Report, October 2002
May 09, 2003 - 16:15 - Re: The New C++: Trip Report, October 2002
int f(int a,char b,bool c[10]);
struct {int a; char b; bool c[10];}
template<int a,char b,bool c[10]> class c;
tuple<int,char,bool[10]>
void g(void) {int a; char b; bool c[10]; ..};
Spot the five completely different sets of grammar and semantics to represent the notion of "an integer, a character, and 10 booleans" in different contexts.
Using Memory Errors to Attack a Virtual Machine
May 19, 2003 - 14:30 - Re: Using Memory Errors to Attack a Virtual Machine
Using Baseball Bats to Attack a Virtual Machine.
Don Box on C# generics vs. C++ generics
May 13, 2003 - 12:53 - Re: Don Box on C# generics vs. C++ generics
Type-safe genericity doesn't have to be terribly hard. If you look at it from the C# or Java point of view, it's just a matter of disallowing any operations which aren't typesafe. The problem with this simplistic approach is that it disallows many common things that programmers want to do, such as passing an array of integers to a function expecting an array of objects -- which at first glance seems reasonable because all integers are objects. But the type of mutable integer cells is not a subtype of the type of mutable object cells, so such an "upcast" isn't safe.
So you have the following options:
- Keep the C#/Java style type system and disallow anything that's not statically sound. Safe, but inconvenient.
- Allow such unsafe casts, and check for failure at runtime. Convenient, but sometimes goes horribly wrong.
- Implement all of the language facilities you need to combine the best of both worlds, static safety and convenience. This is more difficult than it first seems, especially when you try to implement such things with performance similar to C#/Java. You need a type system and syntax that lets you express the (typewise) different concepts of "arrays of integers", "mutable arrays of integers", "arrays of mutable integers", "mutable arrays of integers", and all of their subtyping rules and fine structure. This brings up the issues of structural vs nominal identity, etc. You have to go way off the beaten C#/Java path to solve these problems.
May 19, 2003 - 19:40 - Re: Don Box on C# generics vs. C++ generics
Some might argue that this is just one more argument against mutable values...
Mutable values can be made to work safely, but a safe implementation doesn't look quite like C++, C#, or Java. The pure functional view of mutables is that the "heap" is passed in and out of every function; all code is executed eagerly in a well-defined order (i.e. normal order evaluation); there is a "ptr(t)" type constuctor describing the type of pointers to items of type t; the only heap operations are "new pointer from value", "read from pointer", "write to pointer" and in the presence of subtyping, for any pair of unequal types t and u, the types ptr(t) and ptr(u) are disjoint, even when t and u are in a subtyping relationship.
In this framework, the mysterious notions of lvalues, rvalues, mutable variables, null pointers, value identity vs referential identity, and mutable array coercions all go away.
Obviously one wouldn't implement a real runtime by passing a heap in and out of every function, but if you take this conceptual view of mutability, then it's easy to understand exactly how C# and Java differ from this model and how doing so breaks static type safety.
It's an interesting project to start with a safe system like the above and, purely with syntactic sugar and typesafe program transformations, how close you can get to the "look and feel" and performance of Java and C# mutability. It turns out you can get pretty close, and that the things you can't quite mimmic turn out to be unsafe or not well-defined.
Sun, Zend push scripting for Java
Jun 13, 2003 - 22:28 - Re: Sun, Zend push scripting for Java
Are we witnessing convergence in the field of programming languages?
You can compile any language down to 8086 assembler code. When Intel released the 8086, did we witness convergence in the field of programming languages? :-)
What we're seeing is just Turing-completeness and a reasonable enough design so that translated code didn't suffer horrible performance penalties. This is all that's really necessary for all existing languages to target your platform.
This is actually a good thing for language diversity. If you're a language creator, you'll likely find it far easier to target these runtime environments than to generate assembly code, and you'll likely see far better performance from these runtimes (with their JIT compilation) than if you write your own interpretter.
But let's not read more into the CLR and JVM than is there. They're just targets for which your compiler can generate object code, as are X86 and PowerPC.
Point Free Style
Jun 19, 2003 - 16:08 - Re: Point Free Style
I've always found it very hard to read and write point-free code. I wonder if this is just a matter of practice. Anyone here feel comfortable with the point-free style?
Partial Types in C#
Jul 11, 2003 - 10:47 - Re: Partial Types in C#
Gee, everyone hates the way our UI builder works by auto-generating a pile of messy code whenever you move controls around a form (unlike Visual Basic, and Delphi, and ...). I know, instead of fixing the problem, let's further obfuscate it by adding a bizarre new language feature that breaks well-known principles of encapsulation!
Theorems for free!
Aug 01, 2003 - 14:23 - Re: Theorems for free!
Keep in mind that this "free theorem" business isn't universal, but depends on the particular type theory used. In particular, that the type theory doesn't admit types as terms and doesn't provide any operations on types besides sum, product, and exponential. Haskell is one of the few practical language where it holds.
For example, if your type theory exposed a function inhabited :: type -> bool which returned true if the type were inhabited and false if empty, then not every universal quantification would have such a free theorem. For example, forall a.[a]->[bool] would exist but wouldn't be invertible.
In my view, the reorganization of type theory into category theory has been very harmful to the subject. It has reinforced the separation of types and terms, leaving few researchers to ponder systems combining the two or to even consider the possibility that such systems might be valid and useful. Why? Category theory's focus on isomorphism (instead of equality) doesn't scale. John Baez's writings on things such as "the category of all categories" and "the category of all categories of all ..." illustrate the near-hopeless complexity of isomorphisms-of-categories perfectly. As a result, simple notions like "the type of the type of all inhabited subtypes of the type of natural numbers" seem incomprehensibly difficult in category theory and hence in type theory.
Category theory brought useful advances to topology. No surprise here, because topology is precisely the study of isomorphisms between geometric objects. Type theory, as it applies to any sort of actual programming system, just needs to answer the question, "which values are allowed in this context?" and there is no benefit to dragging isomorphism into this picture.
The state of type theory today seems to me to like a set theory where most refuse to acknowledge the possibility that sets might themselves contain sets, or the existence of functions from sets to sets, from values to sets, or from sets to values. Sure, the current approach brings some simplifications to reasoning about types, but only at the expense of a wealth of new tools for expressing, refining, and reasoning about types.
This conclude's today's rant. :-)
Udell: Symbol grounding, XML, and RDF
Aug 11, 2003 - 17:07 - Re: Udell: Symbol grounding, XML, and RDF
Does anyone else see XML as an overcomplicated solution the meager problem of serializing data in and out of text files?
Closures and mutability
Aug 18, 2003 - 18:43 - Re: Closures and mutability
I absolutely agree with Erik, though I wouldn't have seen it this way without having implemented closures every possible wrong way and finally come around to this way of thinking.
It's quite a change from the C mindset and its simple "data structures <-> blocks of memory" view of the world, but there is great value to forcing all variables in a language to be constant, and expose mutability purely through typed references, solely through the new_ref, read_ref, and write_ref constructs.
Aug 19, 2003 - 13:01 - Re: Closures and mutability
The advantage is conceptual. As you mention, you can always losslessly translate from one representation to the other using boxing. The value I've found in encapsulating all mutability entirely in ref<t> / new_ref / read_ref / write_ref is that it's much easier to combine imperative programming and functional programming into a single framework, and to analyze it using modern type theories.
In the "everything is implicitly mutable unless you declare it constant" view, you have the following subtype relationships:
- int is not a subtype of object. - int is not a subtype of const int. - const int is not a subtype of int. - const int is a subtype of const object.
- is {int,const string} a subtype of const {int,object}? - the answer is not obvious.
In the explicit-references view, you have:
- int is a subtype of object. - int is not a subtype of ref<int>. - int is not a subtype of ref<object>. - ref<int> is not a subtype of ref<object>.
- is ref<{ref<int>,string}> a subtype of {ref<int>,ref<object>}? - obviously not.
The type relationships are clearer in the explicit-references view: all basic data types have the obvious subtyping relationships to each other; but all reference types are disjoint from each other and all other types.
In the everything-is-mutable-by-default view, the relationships are obscured. It becomes quite hard, when looking at two complex data structures containing a mix of constant and mutable fields, to see what sort of structural subtype relationship exists between them.
Aug 19, 2003 - 19:26 - Re: Closures and mutability
Good points. The ideal const/mutable distinction really depends on the typical programming language you're working in.
C, C++, Java, Python, and C# exist in a nice local optima where you can write and modify imperative code very productively without putting a lot of thought into mutable vs constant values. For this style of language, I wouldn't advocate Erik Meijer's approach, because it adds verbosity in order to make distinctions that typical code seldom requires.
Once you generalize your language's feature set a bit, constant vs mutable issues become much more central. The key areas are:
- If a language supports structural subtyping, then the lack of a subtype relationship between int/object and int / const int become very painful. This isn't noticed in the above languages because their type systems (i.e. C structs, C++/Java classes) are purely nominal.
- If a language supports dependent types, then ad-hoc mutability interacts very poorly. Consider something like {t:type:=int; x:t:=7; t:=string;}. In general, this is why features like C++ templates and the upcoming C# generics are purely static. In a fully general dependent-typed language like Cayenne with a clear separation between functional and imperative types, you don't need templates/generics as a feature, because you can simply write a function that takes some type parameters, that returns a function or data type as a result.
- If you want a language with a clean functional subset, in order to take advantage of compile-time term evaluation (important once you have dependent types), implicit parallelism, or lazy evaluation, then it's vital for your language to be able to distinguish (both for types and values) imperative code from functional code, and to default to functional for common constructs.
- In the presence of type deduction, type systems that think in terms of variables being const or mutable behave strangely, because for each term you need to decide whether you mean the actual term, or the implicitly reference-read-coerced version of it, and these end up in a tangled set of interdependencies. When every reference-read is explicit, and every "mutable term" is instead a constant term of reference type, there is no ambiguity.
Aug 23, 2003 - 16:47 - Re: Closures and mutability
Vesa:
In the new language I've been building, there are two main areas where annotations are needed, compared to traditional C-family languages: that a variable is mutable (really that it's a reference to a mutable heap-allocated cell of a particular type), and that a function may have side effects (including heap writing and IO operations).
This is all set up such that, given a piece of simple source code mixing imperative and functional features, it would be relatively straightforward to translate it to either C (where everything is imperative) or Haskell + IO + ST (where everything is functional except the behind-the-scenes work performed by IO).
This works well in my context, but the typical programming style ends up being quite different than in C-family languages. I definitely wouldn't advocate this just as an extension to C. But it can be attractive as you try to merge functional, imperative, and higher-level type system concepts.
Paul:
Given something like C#, Java, or UnrealScript as a starting point, this protocol/auditors approach is a natural and attractive direction.
I actually spent a year of occasional late nights and weekends implementing something along these lines. It seemed really elegant at the time. There were all kinds of attributes you could apply to variables and functions to control their scope and interactions. For one example, there were "visitor references" to objects (as opposed to ordinary references) where you could access variables and functions through the reference, but couldn't capture it or assign it to a non-local variable. There were all kinds of access overrides allow things like inner classes whose variables could be accessed by outer subclasses but not the external environment in general.
As its complexity grew, though, it started to feel very arbitrary. I was also learning about type theory: dependent types, monads, the Curry-Howard isomorphism, and so on, and was starting to feel that these problems could be solved in a more general way. So I started over and have been proceeding on the basis of a dependent-typed functional language with an imperative superset, with various combinations of nominal and structural typing supporting open- and closed-world programming. Highfalutin features aside, though, it's pretty similar in syntax to the C/Pascal family, as opposed to the LISP or ML/Haskell families.
But that's been such a big metaproject that I haven't gotten around to implementing any advanced scoping and security constructs. :-)
General opinions:
- I would like to see issues like "is this variable mutable" addressed at the type system level, independent of security.
- I want any language-level security features to be statically verifyable, not relying on runtime checks. The one exception to this is computing resources (stack size, heap allocations, infinite loops) which unfortunately require a bunch of twiddle factors, because type-based verification of resource bounds is currently intractable for real-world programs.
- I would only hope for language-level security in languages based on intermediate code, run with either JIT or a VM, with something like a bytecode verifier or proof carrying code checker standing between the code and the machine.
- I like the idea of code signing as an optional, security-enabling feature. For example, I would like to be able to release several (Java-style) packages which work independently, but are granted special access privelages with respect to each other.
- I am becoming a big fan of region typing (the ability to have multiple heaps, and segregate access to heaps), as a way of enabling patterns like purely functional functions which can use a local heap internally, but are still guaranteed to have no observable side-effects; and big complex imperative functions which can be executed in parallel because they're guaranteed to be operating on different heaps. (Example: two game levels running on a game server.)
- I think type system based security guarantees can solve all problems with distributed untrusted code.
The question of distributed mutable objects is a different matter. Two observations: Implementing this, even naively and arbitrarily (as Unreal does for game objects), can lead to a really expressive and powerful way of programming. But doing this in a theoretically sound and fully general way seems much tricker, bringing up problems such as transactioning and distributed garbage collection. I can't envision what a general solution to this problem would look like, but it's certainly a great topic for active R&D.
Whither Self
Sep 14, 2003 - 12:01 - Re: Wither Self
"Self" or "this" only work nicely in languages that don't support nesting in a fully general way. For example, in C++, "this" is unambiguous because there is at most one class lexically enclosing the current block of code. But in Java, you have to be very careful to modify occurances of "this" as you move code between inner classes and outer classes.
When defining functions that may include recursive lexically enclosed lambda expressions, the name you associate with a function ("Fac" in the author's example) provides an unambiguous way to refer to that function, while "self" would beg the question: which self? The nearest one lexically, or the outermost one? Obviously, a language could pick one convention or another, but then the meaning of "self" would change if you, i.e., move an expression in or out of a given lambda.
The same problem exists for classes and records when you support nesting them (for example, with inner class or virtual class semantics): It's no longer obvious which "this" you're referring to. In languages that support a "thistype" construct, the same problem exists there.
You could solve this by making "self" constructs indexed: instead of "self", you only have "self[0]", "self[1]", etc. But then you have reinvented de Bruijn indices and, though they work well for procedural manipulation of terms, they aren't very human-friendly.
If one's language supports nesting in a fully general way, I think it is worth considering eliminating any "this" or "self" as standalone constructs, and instead require that any such things be referred to by name.
In the case of functions, this is unambiguous (mod shadowing of variable names): "f" refers to the enclosing function f.
In the case of classes, it's trickier because, depending on your language, there may be up to three things that a name refers to: the current instance (like C++ "this"), the current instance's type (like some languages' "thistype"), and the literal type (like referring to class name directly in C++). Possible syntaxes for these are "c.this", "c.thistype", and "c".
CLR Exception Model
Oct 04, 2003 - 14:33 - Re: CLR Exception Model
Ehud,
Would you consider it uncontroversial for a language in the general spirit of C++/C#/Java to expose exceptions as follows?
- There are two kinds of functions, clearly distinguishable by syntax: imperative functions (which may have side effects and other referentially non-transparent behaviour) and pure functions (referentially transparent).
- Pure functions default to being able to throw no exceptions. To widen the set of exceptions throwable by a pure function, you need to add an extra keyword, like "throws(int,string)", or (in the most general case) "throws(any)".
- Imperative functions default to being able to throw exceptions of all types. To narrow the set of exceptions throwable by an imperative function, you need to add an extra keyword, such as "throws(int,string)" or (in the most narrow case) "throws(none)".
- A type of functions f may only be a subtype of a type of functions g if f throws the a subset of the exceptions of g.
My thinking is that this approach satisfies the largest possible audience, giving you the capability of expressing any possible exception widening / narrowing behaviour, but defaulting to what is typically wanted: pure functions where exception support is narrow and may only be widened explicitly, and imperative functions where exception support is universal and only be narrowed explicitly.
Thoughts?
Oct 04, 2003 - 22:07 - Re: CLR Exception Model
Daniel,
Totally agreed. Type unions especially are an extremely valuable language feature, and aren't too hard to implement. It's amazing how frequently C++/Java/C# programmers create a subclass hierarchy when what they really want to define is a disjoint union, or a type union.
Regarding the distinction between imperative functions and pure functions: programming with local exceptions and local heaps can be done inside a pure function, and the typechecker can assure that no imperative effects can escape out of that pure function. The Haskell State monad provides a good example of how this can be implemented (though there are other ways, and I use an approach that looks more like Pascal/Java/C# syntactically).
The exception handling details are straightforward (i.e. a pure function must be statically provable to catch all possible exceptions its body is capable of throwing before it exits).
The heap details are more complex, and sometimes end up limiting what the compiler can recognize as being locally imperative within a pure function. The most general solution is region typing, but its syntactic overhead is big, so right now I feel OK about accepting the pure/imperative dichotomy and the limitations that go with it.
In general, the set of exceptions throwable by a function, and its heap accessibility (pure or imperative) are things one wants to parameterize over.
For example, you want to be able to write a 'map' function that is pure when its input function is pure, but imperative when its input function is imperative.
This has a nice solution if you embrace the most general subtyping relationship possible (f:a->b <: g:c->d iff b<:d and c<:a and f's set of effects (heap accessibility, set of throwable exceptions) is a subset of g's set of effects. In addition to being able to declare exceptions as "throws(int|string)", you can say "throws(int|f.exceptions)" to say that you can throw integers or any exceptions throwable by f.
Do you think that capability would solve the problems you've pointed out with the pure/impure dichotomy? I haven't run into those myself (yet), but in my work so far I've had somewhat simplistic boundaries between functional and imperative code.
Historic Documents
Oct 12, 2003 - 15:13 - Re: Historic Documents
There are a lot of great paperes in the "collected manuscripts of Edsger W. Dijkstra" section. It's amazing how much of the stuff he said relative to computing in the days of punchcard readers is still true, yet insufficiently recognized, now.
Longhorn Markup Language (code-named "XAML") Overview
Oct 28, 2003 - 14:13 - Re: Longhorn Markup Language (code-named
C# is a wonderful programming language with support for meta-data attributes and reflection, great features for associating properties with UI controls. Yet instead of exploiting this, they define a new dialect of XML, a very programmer-unfriendly language that maps poorly onto the C#/.NET type system.
Worse, to do any actual event handling, they actually embed C# into XML, resulting in code like their example:
<Canvas ID="root"
xmlns="http://schemas.microsoft.com/2003/xaml"
xmlns:def="Definition">
<Button ID="button1" Click="Clicked">Click Me!</Button>
<def:Code>
<![CDATA void Clicked(object sender, ClickEventArgs args)
{
button1.Content = "Hello World";
}
]]>
</def:Code>
</Canvas>
How horrid.
Finalization (CLR)
Feb 21, 2004 - 13:02 - Re: Finalization (CLR)
Supporting finalization in a garbage-collected environment leads to a significant number of issues with performance, non-determinism, and collector inefficiency.
Rather than going down that path, I would much prefer to see significant R&D effort put into the development of imperative-style data structures and algorithms in a way that doesn't require finalization. I have experimented with some of these issues myself.
Here are some thoughts.
File handles: If you are going to just open a file, do a bunch of operations, and close it, you could use a monad encapsulation similar to the way Haskell's State works, enabling opening the file, performing operations, and closing it, in a way that's guaranteed to result in the file being closed when exiting the lexical scope in which it was created.
But much more interesting is the possibility of using memory mapped files, and treating file-opening calls as returning a (possibly mutable) array of bytes which you can operate on. Then the memory mapped file would be closed when the array is eventually garbage collected. The big problem here is that in current file-mapping implementations, holding a file handle isn't referentially transparent: another process can observe whether the file is locked. This could be remedied by adding support in the OS for "tear-off" file mappings which memory map the file and then make it appear to the outside world that the file is closed (for example, so another process could modify it or delete it without a sharing violation). In this case, the OS paging mechanism would need to implement a copy-on-write-or-delete scheme for torn-off file handles, so the referential transparency requirements of both the file-using process and the outside world are satisfied.
Network sockets: Sockets for protocols like TCP are not amenable to finalization-free garbage collectors, because the process of closing a socket has non-referentially-transparent IO effects that ought not occur nondeterministically. But a language could expose higher level protocols that are more suitable, for example creating the illusion of monadic persistent connections between processes.
Note that all of these considerations aren't terribly relevant in a runtime like .NET which doesn't have a first-class concept of referential transparency. There, any variable in any object might change at any time because other functions to which you have passed the object could at any time modify any variable in another thread or through reflection. But in an environment with a functional subset or just a means of specifying and verifying limits on the scope of effects of computations, it becomes very important to avoid observable side-effects of garbage collection.
Python 'for' as Scheme 'let'
Feb 23, 2004 - 13:46 - Re: Python 'for' as Scheme 'let'
I've been trying to sort out these similarities too. There is also a notion of a Monad with a "cut" operator similar to the Prolog cut operation discarding prior choice points; something along these lines may be valuable in modelling a array monad analogy to the "break" command in C for loops. (The "continue" command is already modelled by the array monad's zero.)
The presence of operations with side effects in things like list comprehensions complicates their interpretation as monads. One approach I find promising is to model all imperative operations in a program literally as IO monads, and to have a syntactic translation of things like for loops or list comprehensions whose contents have side effects into a simple composition of the Array or Maybe monad with the IO monad. Obviously this translation will only work for a predetermined set of monads (because not every monad can be composed with IO so as to yield a true monad as a result), but it would be nice to be able to model all imperative constructs and looping constructs with a mainstream-style syntax (as opposed to Haskell syntax) with the monad translation being automatic.
Note that in a suitable type system, you can construct a "TypeMonad" whose map functor takes the (exact) image of a supplied function over a supplied type, unit generates a singleton type containing just the supplied element, join is type union, and zero is the empty type.
The really neat thing is that comprehensions in the TypeMonad correspond to constructing new types elementwise from existing types, i.e. for(a:at, b:bt)[a,b] is the type of pairs containing elements from the types at and bt. And in the IdentityMonad, comprehensions correspond to lets: for(a:x, b:y)a+b is just x+y. So, many of the leading edge syntactic constructs in languages can be represented as comprehensions over a suitable monad.
Feb 23, 2004 - 14:00 - Re: Python 'for' as Scheme 'let'
Pardon the spam, but here are some more notes:
The TypeMonad described above also can be extended with a new "meet" operating (taking typewise intersection) and "infinity" object ("the type of all types" if it exists and is nonparadoxical in a given type system) as its idempotnent. This structure can then be seen as a lattice or a topos or a model of set theory depending on what you want from it. However, TypeMonad can't be sensibly composed with IO, because elements in a set are inherently ordered, in contrast to elements of a list of array.
Finally, there is a concept of a monad with a fixed-point operation (see http://www.cse.ogi.edu/PacSoft/projects/rmb/mdoTalk.pdf), which leads to the interesting possibility of accessing the previous results of a list comprehension while constructing subsequent elements.
Nullable Types
May 31, 2004 - 18:15 - Re: Nullable Types
The existance of the "null pointer" is a language design mistake dating back three decades. Instead of keeping the notions of "pointers to elements of type t" and "type t extended with an additional 'null' value" separate, C combined the two orthogonal notions into a single type constructor: pointers that might be 'null'. C++ confused things further by defining references in a way that makes it look like they can't be null, though they actually can be!
There is a much cleaner way of implementing pointers:
- For every type t, the type ^t represents "the type of pointers to elements of type t". These include only the actual pointers which can be safely dereferenced.
- For every type t, the type ?t represents "either an element of t or the special 'null' value".
- Then ?^t is the equivalant of "pointers that may be null". Define a shortcut syntax for this: *t = ?^t, and there! You have backwards compatibility and a more sensible type theory.
Unfortunately, C# confuses pointers and objects and mutability issues in so many other ways that such implementing this orthogonality is probably not worthwhile.
The view from the left
Jun 12, 2004 - 13:58 - Re: The view from the left
Frank, is there a known way of expressing non-trivial type isomorphisms in programming languages, where the compiler can either auto-derive the isomorphism itself, or let the user express it in such a way that the compiler can verify its validity? This seems like a necessity for what you're looking to do with Arrow.
Do you have any concrete examples of writing programs "up to isomorphism"? For example, a monad's unit is only determinable from the monad's functor up to unique isomorphism. For example, with the Haskell list monad, three potential units are (x::t)->[] and (x::t)->(x::[]) and (x::t)->(x::x::[]). Can you program with lists in a way that is generic with respect to list unit? Does the trivial case of [] tell us anything new about list programming? Does the fact that "double-consed" lists can't be exactly typed in most languages prevent this approach from working?
Some concrete examples would be very valuable.
Jun 17, 2004 - 18:52 - Re: The view from the left
Frank,
Thanks for the info! And pardon my butchering of Haskell syntax in those examples. What I really meant is that from the array functor (in Haskell and elsewhere), one can derive several different monads, each with a different unit:
unit :: t->[t] unit1 x = [x] -- The ordinary unit unit2 x = [] -- A trivial unit unit3 x = x:(x:[]) -- The "double consed" unit
And these different monads, except the trivial one, are isomorphic. So in order to deal with types "up to isomorphism", it seems like one has to be able to deal with functors "up to natural isomorphism", and so on. Sounds tricky, but valuable.
Error handling strategies
Aug 20, 2004 - 03:18 - Failure as a first-class concept
It is interesting for languages to treat failure as a first-class language concept, requiring explicit handling of any failure conditions that may occur. Java's (controversial) exception declarations provide one approach to this. Icon provides another which is very expressive but quite different.
I'm not a fan of Icon's multi-valued semantics (which wrap a nested list comprehension around the entire program!) But I like the basic style it encourages:
if(f:OpenFile("Log.txt"))
Write(f,"Some Text")
else
Print("Failed to open file")
Whereas, on the other hand, writing:
f:=OpenFile("Log.txt")
Write(f,"Some Text")
Is disallowed by the compiler because the "OpenFile" expression may fail, but isn't explicitly handled.
This technique can be combined with dependency and sequencing to allow some very expressive programming interesting idioms, such as:
if(c:ParseChar(),c>='A',c
Which not only tests some conditions, but also creates variable bindings. For example, you can use this for type-aware casting to work in an environment with compile-time array bounds checking:
PrintElement(a:[]int,b:int):= if(i:nat <a len="b)" else
In this example, we're casting the integer "b" to "the type of natural numbers less than the length of the array a", and binding a variable to the result which can then be safely used to index into the array.
A final example uses "?t" syntax (similar to Haskell "Maybe t") for the type of an optional element of type t, and "^t" for "Pointers to non-null elements of type t":
List:=type{x:int,next:^List}
Display(a:List):void
Print("Value is ",a.x)
if(n:a.next)
Display(n^)
This provides a clean type-theoretic solution to earlier discussion on "nullable pointers", by orthogonalizing the concepts of pointers and optional values, and assuring that in any context where an optional value is used, it must be explicitly tested with a conditional to guarantee that no runtime failure can occur.
I have implemented this on a fairly large scale in an experimental language and feel the results are promising. I'd love to hear feedback on this approach, and also whether readers find this C-Pascal-Python hybrid syntax attractive or revulsive.
Aug 20, 2004 - 05:06 - Wheat
Wheat's approach sounds very useful and implementable. I evaluate expressions that may contain failure by (conceptually) evaluating them as if they were comprehensions in something like Haskell's "Maybe" monad. It would be straightforward to combine this with error codes by evaluating them in a order-dependent monad that looks something like "Perhaps t = Really t | Error string".
I have a similar operator syntax "a||b" which evaluates to a if a doesn't fail, and evaluates to b otherwise. This ends up working almost exactly like C's || operator (including evaluation order for imperative terms), but it propagates the original terms (rather than just a boolean true/false value). Because of the way the failure propagates, a||b is literally translated to "if(x:=a)x else b".
Using these failure semantics, it so happens that no explicit boolean operators are needed: the array-former (a,b) acts as a boolean conjunction, this (a||b) operates as a disjunction, and "!a" negation translates to "if(x:=a) fail else {}".
The operators support variable bindings, so an expression of the form "if(x:=a||b,y:=c).." executes the ".." part if ((a doesn't fail or b doesn't fail) and c doesn't fail), sequences any imperative effects there, and lets you access "x" and "y" with knowledge of their type. For example, if a and b evaluate to integers (or fail), you can use x as if it's an integer.
Implementing Declarative Parallel Bottom-Avoiding Choice
Aug 26, 2004 - 00:06 - Thoughts
In this area, I think the research community is missing the boat in some important ways.
First of all, with "amb", there are really two kinds of bottom that really ought to be distinct: the notion of "no values", and the notion of divergence due to a never-ending computation. Call the former "nothing" and the later "bot" (Haskell folks write _|_ for bot.)
Computing with "nothing" is very simple and well-founded. "nothing amb x" and "x amb nothing" are equivalant to "x". McAllester's Ontic language takes the amb operator a lot further than McCarthy had original envisioned, and enables one to bundle up collections of amb values into set-like types. Ontic's syntax for this is "the-set-of-all x".
This "the-set-of-all x" construct is the microscope under which "nothing" and "bot" are revealed to be distinct concepts. In a type containing a bunch of amb values, some of which may be divergent, it's impossible to perform any useful operations except for positively verifying that an element is a member. Membership can't be refuted, because that requires executing a divergent computation. The only way bottom-avoiding choice can be made usable is with a Prolog-style Cut operator, whose results are nondeterministic and likely to be unpredictable and weird in practice.
Thus I'm a big believer in amb as a fundamental programming construct, and "nothing"-avoiding choice is trivial in this system, but "bot"-avoiding choice seems an impractical and unhelpful tool in language design.
Note on the poster's comment on amb breaking referential transparency: When amb is implemented correctly (with proper sharing of choice points), it's referentially transparent.
For example, (\x.x+x)(3 amb 5) beta-converts to let x=(3 amb 5) in x+x, which reduces to (let x=3 in x+x) amb (let x=5 in x+x), which reduces to 6 amb 10.
This can be summarized as "only single-valued terms evaluate to values; zero-valued or multi-valued terms must be treated as references to a single, shared choice point".
Another way of visualizing terms like (\x.x+x)(3 amb 5) is to translate them to Haskell comprehensions: [x+x | x <- [3,5]].
Another example is "let x=nothing in 7" must evaluate to nothing, not 7. This is obvious from the comprehension [7 | x<-[]]. At first, this seems counterintuitive because we tend to think of any expression you can write as evaluating to a value, whereas with "amb", only single-valued terms evaluate to values; the ordinary substitution rules don't apply where multiple values are present.
Aug 26, 2004 - 16:29 - List Monad vs Amb
The only differences between a list comprehension and a similar computation involving amb are that list comprehensions are explicitly ordered, and duplicate items remain distinct. Amb values are set-like and don't have any inherent order or multiplicity to them; (2 amb 3) is equivalant to (3 anb 2) and (2 amb 2) is equivalant to 2.
If you ignore the underlying order and multiplicity of lists, you can use the list monad to precisely model the behaviour of amb. You can actually define a "set monad" this way. For example, the set monad's join is the union (of a set of sets); its zero is the empty set; its unit forms singleton sets; its plus takes the union of a pair of sets; it's map takes the image of a given function under a given set...each of the monad laws in the set monad happens to be a ZFC axiom.
A Deeper Look At Metafunctions
Sep 03, 2004 - 17:20 - Templates = Lame Dependent Types
Isn't it somewhat silly to embed a limited language with bizarre syntax into existing languages for the purpose of executing code at compile time? Dependent types and compile-time partial evaluation provide a much more general and flexible means for doing this sort of thing.
Combining lazy and eager evaluation of terms
Sep 07, 2004 - 18:20 - Y, etc.
Dominic: That approach to Y works under lenient evaluation, and is actually the literal translation I use to evaluate recursive terms.
The only terms that behave differently between lenient and lazy schemes are terms with subterms which would not terminate if evaluated, such as "let x=x in 7". In a normal-order lazy language, such such subterms only cause divergence if their evaluation is actually demanded by the program's final result. In a lenient language, all subterms are always evaluated to ground values "eventually", even when their values are not demanded. So, this example evaluates to 7 in a lazy language, and diverges in a lenient one.
The "eventually" part is what enables lenient evaluation to succeed for more programs than strict evaluation. For example in "x=Pair(3,First(x))", the First(x) term may begin evaluation before x has been fully reduced to a ground value.
David: I'm not sure if pH (Parallel Haskell) uses a true lenient evaluation scheme like Id90. The reference I have on it ("Implicit Parallel Programming in pH", a book by Nikhil & Arvind) is somewhat unclear on this.
Apple Flunks First Grade Math
Sep 10, 2004 - 23:57 - rational bignums as the default
Couldn't agree more! Floating-point math is a useful but very quirky engineering tool and ought not be in such plain view of ordinary users.
I'd definitely like to see, in future languages, data types for integers, natural numbers, bounded natural numbers, and rational numbers that are in the obvious subtype relationship: 0..0 <: 0..1 <: 0..n <: nat <: int <: rat -- and that obey all of the laws of arithmetic. You know, those laws the ancient Greeks invented a few millenia ago, not the fixed-width twos-complement approximation that CPU's use internally.
My experience is that this can be done without significant performance compromises. For example, in Unreal, about 95% of the occurances of integer math are used for indexing into arrays; the bounded natural number types can always be safely used there and fit into traditional operations. The key realization here is that any number that can safely index into an array that exists in memory can be stuffed losslessly into the machine word size without sacrificing performance in all reasonable cases where real bignums aren't required. This is a decision a compiler ought to make rather than forcing programmers to decide, as C# does, "Which one of the language's 12(!) built-in numeric data types do I want to use in THIS situation?"
Generics for the masses
Sep 14, 2004 - 02:05 - There must be a better way
It feels like there must be a better way to do this. I've been thinking about this for a while but a comprehensive solution isn't clear. But a few things do seem obvious:
First of all, function-overloading provides a more straightforward means of expressing this than typeclasses. This is even more so when one may abstract over the domain of an overloaded function. For example, given a reference "f" to a possibly-overloaded function, you want to be able to extract the type of f's domain using a syntax like "f.dom". This way, for a contrived example, you can write a generic function to take the sum of certain kinds of elements over certain data structures:
// The sum of a single integer is that integer. sum(x:int):=x // The sum of a character is zero. sum(x:char):=0 // To generate the sum of a list of // elements compati
