Tim Sweeney Lambda-The-Ultimate Archive
Overview
A collection of Tim Sweeney's postings to Lambda the Ultimate, a programming weblog which "deals with issues directly related to programming languages, and programming language research.".
Some posts may not be formatted properly or may contain incomplete code fragments. This seems to be a defect of the Lambda the Ultimate posting system, which has partially eaten a handful of posts and on the whole produced html and xhtml that is not what one would call "well formed." Adjustments and corrections have been made where possible.
Article List
- Paul Graham: Accumulator Generator
- Feb 13, 2003 - Re: Paul Graham: Accumulator Generator
- Call-by-name is dual to call-by-value
- Mar 03, 2003 - Re: Call-by-name is dual to call-by-value
- Mar 04, 2003 - Re: Call-by-name is dual to call-by-value
- Book and Course in Constraint Programming and Reasoning
- Mar 31, 2003 - Re: Book and Course in Constraint Programming and Reasoning
- Apr 01, 2003 - Re: Book and Course in Constraint Programming and Reasoning
- J. McCarthy: Towards a Mathematical Science of Computation
- Hundred Year Language
- Apr 11, 2003 - Re: Hundred Year Language
- Apr 12, 2003 - Re: Hundred Year Language
- Selectors Make Analysis of case-lambda Too Hard
- Apr 22, 2003 - Re: Selectors Make Analysis of case-lambda Too Hard
- David McAllester's "Ontic" Language
- Apr 28, 2003 - David McAllester's "Ontic" Language
- Python Metaclass Programming
- Apr 22, 2003 - Re: Python Metaclass Programming
- Apr 23, 2003 - Re: Python Metaclass Programming
- Apr 24, 2003 - Re: Python Metaclass Programming
- Apr 25, 2003 - Re: Python Metaclass Programming
- Apr 28, 2003 - Re: Python Metaclass Programming
- Apr 29, 2003 - Re: Python Metaclass Programming
- May 02, 2003 - Re: Python Metaclass Programming
- The New C++: Trip Report, October 2002
- May 09, 2003 - Re: The New C++: Trip Report, October 2002
- Using Memory Errors to Attack a Virtual Machine
- May 19, 2003 - Re: Using Memory Errors to Attack a Virtual Machine
- Don Box on C# generics vs. C++ generics
- May 13, 2003 - Re: Don Box on C# generics vs. C++ generics
- May 19, 2003 - Re: Don Box on C# generics vs. C++ generics
- Sun, Zend push scripting for Java
- Jun 13, 2003 - Re: Sun, Zend push scripting for Java
- Point Free Style
- Jun 19, 2003 - Re: Point Free Style
- Partial Types in C#
- Jul 11, 2003 - Re: Partial Types in C#
- Theorems for free!
- Aug 01, 2003 - Re: Theorems for free!
- Udell: Symbol grounding, XML, and RDF
- Aug 11, 2003 - Re: Udell: Symbol grounding, XML, and RDF
- Closures and mutability
- Aug 18, 2003 - Re: Closures and mutability
- Aug 19, 2003 - Re: Closures and mutability
- Aug 19, 2003 - Re: Closures and mutability
- Aug 23, 2003 - Re: Closures and mutability
- Whither Self
- Sep 14, 2003 - Re: Wither Self
- CLR Exception Model
- Oct 04, 2003 - Re: CLR Exception Model
- Oct 04, 2003 - Re: CLR Exception Model
- Historic Documents
- Oct 12, 2003 - Re: Historic Documents
- Longhorn Markup Language (code-named "XAML") Overview
- Oct 28, 2003 - Re: Longhorn Markup Language (code-named
- Finalization (CLR)
- Feb 21, 2004 - Re: Finalization (CLR)
- Python 'for' as Scheme 'let'
- Feb 23, 2004 - Re: Python 'for' as Scheme 'let'
- Feb 23, 2004 - Re: Python 'for' as Scheme 'let'
- Nullable Types
- May 31, 2004 - Re: Nullable Types
- The view from the left
- Jun 12, 2004 - Re: The view from the left
- Jun 17, 2004 - Re: The view from the left
- Error handling strategies
- Aug 20, 2004 - Failure as a first-class concept
- Aug 20, 2004 - Wheat
- Implementing Declarative Parallel Bottom-Avoiding Choice
- Aug 26, 2004 - Thoughts
- Aug 26, 2004 - List Monad vs Amb
- A Deeper Look At Metafunctions
- Sep 03, 2004 - Templates = Lame Dependent Types
- Combining lazy and eager evaluation of terms
- Sep 07, 2004 - Y, etc.
- Apple Flunks First Grade Math
- Sep 10, 2004 - rational bignums as the default
- Generics for the masses
- Sep 14, 2004 - There must be a better way
- Sep 16, 2004 - Re: There must be a better way
- Amazon Associates (+ other advertising)
- Sep 16, 2004 - Good idea
- History: Array languages
- Sep 16, 2004 - Why not more array languages?
- multidimensional arrays
- Sep 17, 2004 - Arrays are functions?
- Fresh O'Caml
- Sep 22, 2004 - Generalization?
- Monads in various languages
- Oct 02, 2004 - This problem appears to exist
- Links (Wadler)
- Jan 06, 2005 - A major event
- Jan 06, 2005 - Why it matters
- Generics: The Importance of Wildcards
- May 27, 2005 - Wow
- Scrap your boilerplate with class: extensible generic functions
- May 28, 2005 - URL
- large imperative code --> functional
- Aug 16, 2005 - Display Lists
- Distributive laws for the Coinductive Solution of Recursive Equations
- Sep 13, 2005 - Difficult to read
- Sep 14, 2005 - Aha.
- Garbage Collection as a Lazy Algorithm
- Sep 19, 2005 - Relationship
- literature on commutative lifted boolean operators
- Oct 07, 2005 - Parallel Or
- The Type-System-Feature-Creep Death Spiral
- Oct 31, 2005 - partial evaluation
- Nov 01, 2005 - "Get rid of type declarations"
- Generalized Algebraic Data Types and Object-Oriented Programming
- Nov 24, 2005 - It's always great to see rece
- Anyone remember Russell (the language, that is)?
- Nov 24, 2005 - Thankfully, the full paper is
- Return of the Global Variables?
- Dec 31, 2005 - Same hack, new name
- Duck typing vs Structural Subtyping
- Dec 29, 2005 - The essence of duck typing
- Jan 02, 2006 - Extensional Polymorphism
- Naked Objects
- Jan 31, 2006 - Interesting
- Feb 01, 2006 - Introspection
- Feb 02, 2006 - The Subtyping Property
- The Next Mainstream Programming Languages
- Feb 03, 2006 - Garbage collection
- Feb 03, 2006 - Why virtual classes?
- Feb 09, 2006 - Tim's idea of adding more
- Gilad Is Right
- Feb 18, 2006 - No
- Feb 18, 2006 - Names and slots
- Virtual Machines, Language Runtimes, and the Future of Objective C
- Mar 13, 2006 - A real problem?
- Rich resource site for the programming language "K"
- Mar 13, 2006 - Yikes
- Mar 14, 2006 - There are two things here
- Folding neither Left nor Right (or how to avoid overspecifying the solution to a problem)
- Apr 30, 2006 - Neutral folds
- A note on distributed computing
- May 04, 2006 - Is this a political dissertation?
- The Weird World of Bi-Directional Programming
- May 31, 2006 - Excellent presentation
- Variance and Generalized Constraints for C# Generics
- Jun 19, 2006 - Tension
- Petition for adding garbage collection to C++.
- Jun 27, 2006 - Unreal Engine 3 garbage collection
- Jun 27, 2006 - Tim Sweeney
- Securing the .NET Programming Model
- Jun 27, 2006 - Full abstraction in multi-language runtimes
- Jun 27, 2006 - Holes
- Jun 30, 2006 - Unexpected behavior
- Jul 06, 2006 - You're right
- Socially Responsive, Environmentally Friendly Logic
- Jul 11, 2006 - Weird
- Cost of provably-correct code
- Jul 25, 2006 - Big Theorems
- Generics as a Library
- Jul 28, 2006 - Question
- Jul 29, 2006 - Expression Problem
- Aug 01, 2006 - Parametricity in Haskell
- Overloading : Why do some languages leave it out?
- Aug 03, 2006 - Overloading is hard
- Constant Values as Types
- Sep 03, 2006 - Singleton types
- Sep 03, 2006 - Right
- Small Value Set Types
- Sep 03, 2006 - Finite set types
- expressivity of "idiomatic C++"
- Sep 08, 2006 - Relying on libraries
- Why Johnny can't code
- Sep 18, 2006 - Early programming languages
- Is "post OO" just over?
- Sep 24, 2006 - AOP is only successful in
- Relationship between access modifiers and type
- Oct 03, 2006 - What are access modifiers?
- A Very Modal Model of a Modern, Major, General Type System
- Oct 10, 2006 - Quantifiers
- Quotation and evaluation -- or, how pure is the pure lambda calculus?
- Dec 27, 2006 - Lambda terms as graphs
- AgentSheets: End-User Programing (and Secret Lisp Success Story!)
- Dec 31, 2006 - Anti-Objects
- Patrick Logan on Software Transaction Memory
- Feb 09, 2007 - STM nervousness
- Feb 09, 2007 - Can you help me understand
- Feb 09, 2007 - Bloat
- Behaviour Diffing
- Feb 27, 2007 - Behavior diffing
- Syntax Solicited for Imperative-flavored Concurrent Language with Keywords
- Mar 16, 2007 - Imperatrix Ornothologica
- Twitter and Rails brouhaha
- Apr 16, 2007 - Translation
- The expression problem, Scandinavian style
- May 07, 2007 - Excellent
- "Practical" advantages of lazy evaluation
- May 30, 2007 - Advantages
- Partial evaluation applied to high speed lighting preview
- Jun 05, 2007 - Partial Evaluation in rendering
- Merging Functions, Modules, Classes, the whole nine yards...
- Jun 05, 2007 - Merging stuff
- Simply Easy! (An Implementation of a Dependently Typed Lambda Calculus)
- Jul 09, 2007 - Dependent types
- Jul 09, 2007 - Can you imagine your typical
- Jul 09, 2007 - General recursion
- Jul 12, 2007 - Type erasure requires totality
- Jul 13, 2007 - Proof-Carrying Code
- Comprehensions with "Order by" and "Group by"
- Jul 27, 2007 - Is there a better way?
- Universal Type System
- Nov 07, 2007 - Universal Type System
- Nov 08, 2007 - Incompleteness vs The Halting Problem
- Nov 10, 2007 - Turning complete type systems
- Derivation trees for lambda-calculus
- Nov 27, 2007 - Abstract syntax tree
- foundations for J, APL etc
- Nov 29, 2007 - APL and vector-like data types
- Parametric datatype-genericity
- Dec 04, 2007 - Genericity
- A Growable Language Manifesto
- Dec 07, 2007 - Nice Manifesto
- The YNot Project
- Jan 29, 2008 - Steps to Practicality
- Arc is released
- Jan 30, 2008 - ASCII Only?
- Jan 31, 2008 - The mess that is Unicode...
- Uniform naming
- Feb 18, 2008 - Dereferencing operators
- Pure, Declarative, and Constructive Arithmetic Relations
- Mar 04, 2008 - Retreating from pure Horn clauses
- Variation of C's inline conditional
- Mar 13, 2008 - Conditional
- Species: making analytic functors practical for functional programming
- Automatic Patch-Based Exploit Generation
- Apr 29, 2008 - Ouch
- On the importance of Turing completeness
- Jun 11, 2008 - Why Turing Completeness?
Posts
Paul Graham: Accumulator Generator
Feb 13, 2003 - 16:17 - Re: Paul Graham: Accumulator Generator
From the site:
Note: (a) that's number, not integer, (b) that's incremented by, not plus. <<
This is an interesting study of solutions to an ill-defined problem.
1. The term "number" is left undefined. If the problem we posed in terms of "integers" or "natural numbers", it would be clear. Are complex numbers, numbers? Are p-adics and reals, numbers? If you define "number" to include all of these different mathematical objects, then the problem is ill-typed because, for example, you can't add a p-adic to a real. The problem doesn't define what should happen in this case; should it crash? Return 7*pi?
2. You can't "increment a number". Adding 2 to 3 doesn't change the value of 3! 3 still remains 3. Numbers, as defined by mathematics sense, are immutable. One might correct this by indicating that your function should take a reference to an integer, and add a certain integer to the value referred to. But is such a function actually a function? Not in the mathematical sense. What about the case where the reference is null, if the language supports such a concept?
What the article did succeed in doing, if you read through the code examples, is highlight the non-portability (across languages) of programs depending on ill-defined concepts like "references" and "numbers". The problem as posed can't be solved minimally (with neither error cases nor behaviour beyond what is specified) in any of the languages shown!
-Tim
Call-by-name is dual to call-by-value
Mar 03, 2003 - 17:39 - Re: Call-by-name is dual to call-by-value
Related to non-deterministic extensions of the lambda calculus, have you seen McAllester's Ontic?
http://ttic.uchicago.edu/~dmcallester/ontic-spec.ps
Here the 'amb' operator is used both to express potential values of terms, and to express types, interesting because it enables one to express types and values in the same term language.
I've been implementing a language using this approach and it seems very fruitful. Amb-style nondeterminism needn't be incompatible with call-by-future, as long as the reduction system is aware of the sharing semantics.
-Tim
Mar 04, 2003 - 17:55 - Re: Call-by-name is dual to call-by-value
Paul,
When I get some time I'll try to submit something on UnrealScript. Here's the quick summary.
Plusses: - Using the Java principles as a starting point has worked well (no global variables and functions, a distinction between object types and simple types, etc). - The game-oriented extensions (state scoping, simulated multithreading with latent functions) have significantly improved the process of writing gameplay code. - Shipping the editor & compiler with the game have helped mod authoring significantly. Lots of people actually learned to program with UnrealScript, which is neat. - The script metadata that's exposed to C++ greatly simplified the implementation of the user interface, networking code, import/export code, and many other things.
Minuses: - Performance is a problem sometimes (10-20X slower than C++), but the language structure is not very amenable to writing a simple assembly code generator. - The widespread sharing of object data between UnrealScript and C++ has caused complications in keeping the definitions synchronized and makes porting harder. - "Pause between level changes and garbage collect everything" doesn't scale.
You can't really write a BNF grammar for UnrealScript, since the parser depends on the metadata generated from previously parsed code for disambiguiation - similar to C++, but in retrospect a bad idea. If you'd like to see the compiler source though I'd be glad to email it. Besides shedding light on the language grammar it also serves as an example of how to build an 8000-line single-pass recursive descent lexer and parser for a modern language, and why this is not the best way to do things. :-)
Book and Course in Constraint Programming and Reasoning
Mar 31, 2003 - 18:37 - Re: Book and Course in Constraint Programming and Reasoning
These languages seem very attractive because they make it look like you can specify a problem very concisely, and the language will automatically work out the solution for you. Several comments:
- This doesn't really work in practice. "Generate and test" is hopelessly exponential for problems of interesting size.
- However, constraints can be very useful as a feature in a type system in a more traditional language. Being able to express things like "the type of natural numbers less than 31" and have the compiler perform range arithmetic as part of type checking as very useful. This approach is the first step towards implementing real mathematics in type systems. By "real mathematics", I mean mathematics where natural numbers are a genuine subtype of integers, which are a subtype of rationals, and when you perform common operations on numbers of certain types, the results are closed as you'd expect (i.e. integer*integer=integer), without mathematically unsound casts and coercions.
- I dislike the lack of structure in the example languages described; most of the research is being done in Prolog-style "everything is global". Because of this, there is a huge gap between these R&D languages and any future implementation that might appeal to mainstream programmers. It's very interesting to explore these concepts in structured languages that include things like dependent-typed records.
- It even more interesting to apply CLP style constructs to a language that unifies types, sets, and nondeterminism into a single framework, such as McAllester's "Ontic" language. Here, constraints just work automatically as part of the type system. Also, you get first-class proof:proposition :: value:type, even when the constraints are intractable. For example, you can express the type of pythagorean triples as:
t typedef {x:nat,y:nat,z:nat=sqrt(x*x+y*y)};
And whether or not the language is able to tell you anything useful about the type t, such as whether it's inhabited, and one or more solutions, you can always say:
x:t={3,4,5};
And the language accepts it. Verifying that a deterministic element satisfies a (probably non-deterministic) set of constraints is a much easier problem than trying to solve the constraints in general, so it's reasonable to support logic constraints in a language even when the general case of typechecking/solving them is incomplete or undecidable.
Apr 01, 2003 - 14:51 - Re: Book and Course in Constraint Programming and Reasoning
Dominic,
In the style of code I was writing, there are three cases:
- Some constraints can be verified at compile time. These simplify down to determinate terms.
- Some constraints can only be verified at runtime, so the compiler just marks them nondeterministic (i.e. the type may be inhabited by 0, 1, or many terms), and lets the runtime sort them out.
- Some constraints can't be verified due to limitations of the language or to undecidability in general. These result in a compiler error. For example, you can express Fermat's Last Theorem as "flt typedef {x:nat>0, y:nat>0, z:nat>0, n:nat>2, x^n+y^n=z^n }". The compiler has no idea whether that statement is inhabited or not, but it's happy to accept the definition. And if you supplied a deterministic value x, the compiler or runtime can tell you whether x is a witness to flt.
I have a compiler that accepts these kinds of expressions and can verify constraints. For example with the above definitions, you can say:
x:t={3,4,r:nat}
And the term reduces to {3,4,5}. This is just a simple case of filling in the blanks and forward-propagating values around terms. This isn't an everything-is-global implementation like Prolog; everything can be arbitrarily nested in records, closures, arrays, etc.
The "=" operator performs unification. In the case of deterministic terms (the only thing you can express in languages like C++), it works just like you'd expect, but given non-deterministic terms a and b, "a=b" denotes the set of values that a and b have in common.
You can't express this sort of thing in C++, because there's a syntactic distinction between templates (which are purely compile-time), and functions (which are a purely runtime). So, for example, you can't write a simple function f(x) and template t<i> and refer to t<f(x)> for example. You can't write a single "assert" that functions both at compile time and runtime. You can't write a matrix(n,m) class which supports type-checked fixed-size matrices and unknown (at compile time) sized matrices.
So every C++ class library tends to implement stuff twice, once for the purely runtime case (i.e. for variable-size matrices) and again using templates for the purely compile-time case (i.e. for fixed-size matrices).
These limitations are understandable for a language that evolved from C, a "processor independent assembly language" from the 1970's. But it's very unfortunate that modern efforts like the Java generics extensions, and the C# generic types proposal, still go that route. Dependent types provide a natural and far more general solution than templates, allowing for checking at compile-time when possible, and at runtime otherwise.
-Tim
J. McCarthy: Towards a Mathematical Science of Computation
Apr 04, 2003 - 16:10 - Re: J. McCarthy: Towards a Mathematical Science of Computation
The problem with proving programs correct is that, in all systems produced to date, the proofs are far larger and more complicated than the programs they're proving correct. Even the proposition (the type of the proof) tends to be more complex than the actual program, for example the proposition that qsort returns its argument stably sorted, compared to the implementation of qsort.
The pattern I've seen informally looking at the research are factors like 1:2:40. For each line of program code, you need 2 lines of specification code, and 40 lines of proof code.
The end result is that it's easier to eyeball your program and guess that it's probably correct, than to eyeball the proposition being proved and guess that it's probably proving the right thing. So, as far as shipping quality code is concerned, you get 40X better cost/benefit from writing code than from trying to prove stuff about it.
Proof techniques would need to come a very, very long way before this situation changes.
Hundred Year Language
Apr 11, 2003 - 17:46 - Re: Hundred Year Language
This is a great article, the first such set of predictions I've seen that are grounded in reality and history, rather than silly speculation.
The history of mathematics provides a good roadmap, both for the areas where we can expect major progress, and those where we should only hope for incremental improvement.
When I was a kid, I had an old Calculus book from the early 1900's and learned calculus from it, because that's the only reference I had on the topic. When the subject was later taught in school, it was a surprisingly natural transition -- this despite the 100 years of scientific progress that had passed since the book was written.
Language syntax and notation can only be expected to change incrementally. 100 years ago, mathematicians used f(x) to denote "calling function f with parameter x", and they will likely still be doing so 100 years from now. This is almost certain to carry over to future programming, too. The current notations for function calling, infix operators, and definitions won't likely change much. I expect that if in 100 years I showed a programmer "f(x:int):int -> if x<=0 then 1 else x*f(x-1)", he would understand exactly what this code is doing.
The most important change in mathematics over the past 100 years has been in putting it on a solid logical foundation, such as basing calculus on formal limits rather than ill-defined infinitesimals.
I think the most important change in programming in the next 100 years -- and which will happen much sooner than 100 years from now -- is the reformulation of language semantics around well-defined type systems based on mathematical logic. The last 30 years have led to incredible progress in research here, but the results haven't been picked up by mainstream languages, though R&D languages like Haskell based on sound type systems are gaining ground among certain groups.
C# is a great illustration of the superficiality of popular language progress. It is the most polished and fine-tuned mainstream language ever created, yet fails to come to grips with basic mathematics type and theory. You still have absurdities like 2^64=0, 7/2=3, simple expressions like x/y and a[x] which the compiler accepts yet aren't actually typesafe in any reasonable definition of the term, arrays have serious typing flaws, and the basic confusion between things and references to things remains.
Right now, the open language lineages are:
- LISP: This is the apex of untyped programming with nested scoping and metadata manipulation. It's at the end of the line, not because of lack of potential, but it's so simple and powerful than any improvements to it can be implemented in LISP itself without affecting the core language. In 100 years, after the last C and C++ programmers are long gone, there will still be LISP enthusiasts. But don't expect large-scale software development to happen this way.
- C, C++, Java, C#: The problem nowadays isn't with implementation but with the underlying flaws in the paradigm. We can continue to go the route of adding tweaks as with C#, but they're asymptotically approaching a limit that's not far from the current state of affairs.
- Python, Perl, etc: Languages with significant improvements in usability for certain audiences, and are generally a mix of untyped LISP features with typed C/Java-style features. The primary innovations I see here are in the syntax and libraries, for example Python's pioneering of "pseudocode that runs" and Perl's first-class regular expressions and text parsing capabilities. Future languages will certainly inheret these features, even as the underlying paradigm is changed.
- ML, Haskell, simply-typed functional languages with type deduction and, in general, programs consisting of a whole bunch of top-level declarations. The elegance of these languages is based on Hindley-Milner and with most of the neded extensions of the language (subtyping, dependent types), that breaks, and the resulting declarations and error messages become quite a mess. These languages will certainly remain useful to their current fans, but I see no hope for this style of programming to scale up to mainstream development.
So which lineage will be the basis of languages 100 years from now? I think the syntax has long since been determined, and will fall very much in the middle ground between Haskell, Pascal, and C++ -- and not very much outside that triangle.
Regarding the type system and semantics, I think the basic principles in such mathematical-style code as "f(x:int):int -> if x<=0 then 1 else x*f(x-1)" will not change one iota in 100 years, but that the current notions of objects, references, member functions, exceptions, however-many-bit integers, integer being "the" type of 7, and so on will seem terribly archaic. In this area, I don't think any current lineages will stand.
However, I do think we'll converge on a type system that is not far outside of the past 20 years' research into dependent types, intersection and union types, unification, constraint logic, and proofs-as-programs. These type systems essentially have the expressive power of mathematical logic itself, which is fundamental and not a product of invention or whim.
Apr 12, 2003 - 08:41 - Re: Hundred Year Language
Actually, as Hilbert discovered thanks to Goedel, mathematics hasn't got and will never have a solid logical foundation
Please take everything I say here to be mod undecidability, the halting problem, and finite resource limitations on actual computers.
Why is 2^64=0 absurd? This is just exponentiation modulo 2^64.
While the C definitions of (+,-,*) are consistent with the ring of integers mod 2^64, (/,<,<=,>,>=) aren't. C math is inherently non-mathematical.
This was fine for the "platform-independent assembler" that C was in the 1970's, but in the future I think programming language math will map more closely to actual math.
Why is division not typesafe?
int divide(int num,int denom):
In C, this is a useful function, but it's not division. It crashes in the case of a zero denominator, and returns mathematically incorrect results when there the numerator is not an integer multiple of the denominator.
That Ackermann(64,64) is an observationally partial function is an artefact programmers much accept if we want the power of being able to write general recursive functions.
But a division function that crashes and sometimes returns invalid results, isn't going to be with us much longer.
rational divide(rational num,rational<>0 denom):
This is a typesafe and mathematically correct definition of division. Type systems in current languages don't support this, but there is a great deal of promising type system research to draw from for the future.
C# arrays
For example, you can pass an array of integers to a function expecting an array of objects, and try to write a character into it. This is the classic problem of not properly distinguishing a thing and a mutable reference to a thing. A thing is constant and referentially transparent and obeys very useful subtyping and variance rules, for example allowing you to pass an array of integers to a function expecting an array of objects. A reference to a thing isn't like that at all.
Dependent types
How do you feel these work against separation of concerns?
This answer might partially dependent on what one considers dependent types. C++ templates (yuck) and C# generics could be considered dependent types, with a syntax limiting them to situations that allow decidable typechecking. Haskell, if you get into the implicit universal quantifiers involved in type deduction, has a constrained form of dependent types.
One view on dependent typing is that C++/C# generics are just a limited form of expressing functions that take type parameters and return types or functions as results. There's a direct correspondence between something like:
template<type t> class c {int a; t b;}
And a dependent-typed defintion like:
c(t:type):class={a:int,b:t};
But with dependent types, you have much more power to mix types and values. And, typechecking becomes undecidable in the general case, though the subset corresponding to any existing type deduction / template / generic type frameworks is certainly decidable, so it's at least as expressive as past work.
I don't see undecidable typechecking as a big problem. Current compilers either succeed or fail with a definite error. Undecidability just leads to another category of compiler failure, where the compiler can't accept the program, not because it's certain to be incorrect, but because it's not certain to be correct.
unification of functional, logic, OO
I agree 100% about the upcoming convergence of these things and find the topic very exciting.
I think, actually, you'll get constraint logic for free as a result here. Yes, it will be undecidable in many situations, but a practical tool regardless; C++ template typechecking is undecidable, for example. Typecheckers eventually become theorem provers in this case, though probably not very powerful ones.
Selectors Make Analysis of case-lambda Too Hard
Apr 22, 2003 - 17:55 - Re: Selectors Make Analysis of case-lambda Too Hard
In a type system with intersection types and contravariant subtyping of function argument types, you get case-lambda style functionality automatically. In that setting, you have results like:
add0=lambda().0 add1=lambda(x).x add2=lambda(x,y).x+y
intersection(add0,add1,add2)=a function that takes 0, 1, or 2 arguments and adds them.
However, I've found implementing intersection types and full contravariance very tricky, so this isn't an economical solution if all you want is case-lambda.
David McAllester's "Ontic" Language
Apr 28, 2003 - 19:20 - David McAllester's "Ontic" Language
http://ttic.uchicago.edu/~dmcallester/ontic-spec.ps
Ontic is a higher order formal specification language designed for expressing formal concepts as clearly and concisely as possible. The consiceness and clarity is achieved through the use of nondeterminism. In a nondeterministic language each term has a set of possible values. This improves conciseness in a variety of ways. First, since terms denotes sets, nondeterminism allows terms to be used as types. Second, since each term has a set of possible values it is natural for a term not to have any values.
Since reading this paper last year, I've come to believe that McAllester's work is one of the major breakthroughs of type theory, indicative of type systems in future programming languages.
There's something fundamentally new happening here that hasn't been seen in strongly-typed programming languages: a general means of combining types and values into a single Turing-complete framework that gains the expressive power of functional, logic, and constraint logic programming.
To appreciate Ontic, you need to accept three concepts that don't have analogies in current languages: An "either" operator for expressing terms with multiple potential values, where "potential" has a very specific meaning that takes sharing into account and requires looking at beta-reduction more carefully; a type-forming operator, "the-set-of-all"; and a potential value extraction operator "an-element-of".
The value of this paper may be obscured by its LISP-based presentation and by its "nondeterminism" terminology; perhaps this is why almost nobody has noticed it since its 1993 publication. In actuality, Ontic is fully deterministic, and none of its fundamental features are specific to LISP. But if you're into type systems, please bear with me and give it a read.
Python Metaclass Programming
Apr 22, 2003 - 24:53 - Re: Python Metaclass Programming
Type theorists put the question, "What is the type of a type?", on a solid theoretical footing years ago and provided many useful models for answering such questions. Metaclasses aren't one of these answers. The superstition that metaclasses have something to do with types of types does the world more harm than good.
Apr 23, 2003 - 17:07 - Re: Python Metaclass Programming
Well, I made similar design mistakes in past languages I shipped. In UnrealScript (http://unreal.epicgames.com/unrealscript.htm), as in Java, there is a "Class" class which stores metadata about classes; its class is also "Class", and its superclass is "Object".
I didn't know much about type theory then. Now that I do, I recognize this model as unnatural, and think we can get MUCH better end results by exposing a real type-theoretic notion of types, and that this would only make the language more powerful and practical for everyday programming.
Here's how I look at types now:
- A type describes a set of values. The set may be empty, finite, or infinite. In this interpretation, the "int" type is shorthand for "the type containing ..,-1,0,1,2,..".
- A type doesn't "contain" any more information than the set of values it describes. It's not like a record or struct type that contains member functions, member variables ("the type's name"), etc.
- Two types are equal if they describe the same set of values, unequal otherwise (full extensionality). However, comparing two types need not be a decidable operation.
- The operations allowed on types depend on your language:
* In the most basic example, simply typed lambda calculus, there are no operations on types. The only way that types are used is to limit the values which the compiler accepts in a certain place (for example, in a function application for a given function). This information is statically checked at compile-time, and erased by runtime, therefore no aspects of types are visible at runtime.
* Some languages allow dynamic casts, the ability to see whether an arbitrary value, known precisely only at runtime, belogs to a certain type. This needn't imply dynamic typing; there are constructs like "ifcast(x,value,type) then a{x} else b" which maintain static type safety in the presence of casts. Here types, can't be completely erased, so some aspects of the type systems are exposed to the runtime.
* More advanced languages support dependent types, intersections of types, non-disjoint unions of types, testing types for inhabitance/emptyness dynamically, etc.
- It's valid to talk about subtyping in full generality. For example, there is (at least conceptually) "the type of all integers except -3 and 7", and it's a subtype of the type of integers.
- For general purpose programming, you tend to want an empty type containing no values - you might call this "uninhabited" - note there can be only one by extensionality, and a universal type (conceptually) containing every value that is valid in the language - you might call this "object".
- For any type t, there is such a thing as "the type containing all the subtypes of t, and nothing else" (similar to the powerset from set theory). If you have a universal type "object", then the type of all types is precisely the type of all types that are subtypes of object. You can iterate this process to obtain "the type of all types of types" and the result is valid and non-circular. Each iteration you're narrowing the set of values you're talking about, and each time you're just dealing with a subtype of object and nothing fancier than that.
- For sufficiently advanced type theories, we need to accept undecidable typechecking (i.e. for dependent typing in the presence of recursion, the type system is Turing-complete). This doesn't mean that we have to accept unsafe typing, it means the compiler may have to reject some programs that are actually safe, and users may have to work around these limitations.
- It's valid to talk about functions that take types as parameters, and return types as results. These are considered functors in stratified type systems, but if you see types as just a certain kind of value, then you only have functions, and not any special "forall" quantifiers or type-level lambdas.
- This level of generality isn't unreasonable for future languages. For example, see McAllester's Ontic for a framework that combines most of these concepts.
- Practical programmers want metadata, reflective capabilities, the ability to mix and match open-world and closed-world programming models to fit their needs, a combination of nominal and structural typing capabilities, and a mechanism for reusing implementations (i.e. inheritance or encapsulation). These features can all be supported in a sound type framework, but we mustn't confuse metadata and types. Metadata is something that you want certain record-like data structures to contain, so that you can use reflective programming techniques.
Here's how I'd redesign a modern language like Java in this light:
- Java uses "object" to denote "the type of things that contain fields". Change this name to "record", a more specific term that's generally recognized as meaning "something that contains fields".
- Java uses "int" to denote "the type of integers", and "Int" to denote "the type of integers that have been boxed up into a record". In Java, "int" is not a subtype of "object", but "Int" is. Ok, we blew it here. Keep "int" as-is and eliminate "Int". We won't need "Int".
- Add a new type "object" which, this time around, really means "the type of all objects", not just "the type of all records". Now EVERY type is a subtype of object. "int" is a subtype of object, "char", etc.
- "object" doesn't have any member functions or other fields (remember, it's not a record type now). Given an object, you can't do anything with it, except try to cast it to other types.
- Expose functions as first-class language values, with proper variance rules for their parameters and return types. Add a new base type "function", the type of all functions. It's a subtype of objects. Now you can pass around functions freely.
- Eliminate any thought of this C# "automatic boxing" concept. Now we see this is a just a flawed workaround to a previous language flaw: what it's really doing is encapsulating ints (which are now objects) as records, which was only necessary then because we didn't have a real type theoretic "object" type previously, so if we wanted to pass stuff around polymorphically, we had to stick it in a record.
- Add a new "type" type, to mean "the type of all types". "int" is a value of type "type", as is "record", "string", and so on. Eliminate "class".
- Then, add a metadata framework to allow optionally attaching reflective metadata to records. This could look a whole lot like the current Java class / reflection approach, but we would stop pretending that classes had something to do with types, and recognize them as optional metadata tacked onto records.
Apr 24, 2003 - 11:03 - Re: Python Metaclass Programming
Frank:
Every `value' belongs to exactly one type, which never changes; talking about run-time types is meaningless.
Doesn't this view deny the existance of subtypes? For example, what one type does the value 7 belong to? The type of rational numbers, the type of integers, the type of natural numbers, the universal type?
In a type system supporting full powerset style subtyping, 7 belongs to, and only to, every type that is a supertype of the type containing just 7. Full subtyping can be useful in practical programming. For example, if you have arrays of fixed length, it's useful to have "the type of natural numbers less than n" for every n, so that you can index into arrays with the type system guaranteeing indices are in range (rather than runtime checks that throw exceptions.)
Since we never care about the underlying representation, the values
This implies an absolute separation between types and values, which is the norm in most popular type systems. But this separation limits the expressiveness of the type system, and I want to go beyond this and be able to intermix values and types freely, for example in dependent types, type-forming operations, and value extraction operations. This can be done soundly, but requires breaking down the "don't care about underlying values" barrier.
Concretely: I want to be able to write functions that take any mix of types and values as parameters, and return any mix of types and values as results. I want to cartesion-dependent-product up types and values in arbitrary ways. If you have a natural number i, be it statically known or only dynamically known at runtime, I want to be able to form the type of natural numbers less than i, and use it dynamically, for example in runtime casts, and I want the compiler to verify that everything I do is decidable and typesafe.
we never talk about type equality, except trivial equality, but rather type isomorphism: types are isomorphic iff they behave the same way.
This is certainly the mainstream view. I prefer to see extensional type equality exist and be exposed as a first-class operation, though it's undecidable in the general case. (If a compiler is going to accept undecidability in some aspects of typechecking, why not go all the way?)
Consider finite types, for example. Do you consider "the type containing just 3 and 7" to be isomorphic to "the type containing just 'x' and 'z'"? It's easy to come up with a bijection between them, as it is between many conceptually different types (integers and character strings, for example). I don't think type isomorphism gives you much useful information from a type system perspective.
On the other hand, knowing whether two types are extensionally equal (or in a subtype relationship) gives you useful information about which values are allowed in which contexts.
Andrew:
There are logical pitfalls with notions from naieve set theory, such as a universal type ("the type of all values allowed in the language"), a type-of-all-types (itself a type), and Russell's paradox ("T is the type containing all types that don't contain themselves. Does T contain itself?").
There are various ways of resolving these problems:
- The simplest is to disallow such constructs entirely, which is sound, but reduces the power of the type system. For practical programming, it's really useful to have a universal type; for dependent-typed programming a "type of all types, itself a type" is very useful. So foregoing these constructs has a cost.
- These is a little bit of work on non-wellfounded set theory and related type theories that eliminate the paradoxes by changing the underlying axiom systems. Applying such systems to programming languages would be quite speculative.
- Type systems like the one in McAllester's Ontic paper achieve a middle ground by exposing tiered types that are sound and almost as useful for practical purposes, but have some limitations.
- Finally, if you've accepted the presence of undecidability in your type checker, you can expose true universal types, while masking the paradoxes behind the vale of undecidability. For example, you can form a "type containing all types that don't contain themselves", and verify that simple types belong to it and that the universal type doesn't belong to it, but it's impossible to decide or prove (within the type system's axioms) whether it belongs to itself. Not much formal work has been done in this area, but I think it can be made sound, though using such constructs in practice feels quite subversive.
Apr 25, 2003 - 16:29 - Re: Python Metaclass Programming
Having these discussions reinforces how much one's choice of type theory is a matter of opinion and choice rather than "right or wrong".
Frank:
Ok, let's see if I understand what you're advocating. In your type system, given an integer x, I should be able to compare it to the integer constant 7, similarly if both are rational numbers. But if I have an integer x and a rational number y, your compiler would say something like "You can't compare integers and rational numbers, because they're not the same type"?
So, given an integer x, to convert it to a rational number, you'd need to call some function (perhaps in the same module number where the abstract "natural number" type where declared) to convert it.
I'm coming from the opposite perspective, where I want type relationships (like subtyping) to directly correspond to what people expect in the given domain. For example, in mathematics, the set inclusion hierarchy between natural numbers, integers, rationals, imaginaries, complex numbers, etc. I want the type system to directly reflect this, so you can pass integers to functions that expect rationals directly, without any conversions.
This approach needn't prevent one from using abstract types. It's actually sound to mix structural subtyping and nominal subtyping in very interesting ways (though, as far as I've found, this is quite difficult and not very well studied).
What I mean concretely is that you can hide the implementation of "rational numbers" in a module that exposes certain operations (comparison, arithmetic, etc), exposes the constrant that they are a supertype of the integers, yet still hiding their representation type. So, given such abstract defintions of integers and rationals, you can freely pass integers to functions expecting rationals, but not vice-versa. However, even if rational numbers are implemented as (for example) a pair of integers in some cannonical form (in lowest terms, etc), this fact is hidden outside the module, so that you can't (for example) compare a rational number to a naked pair of integers.
With this approach, I think you get the best of both worlds: You can hide the implementations of things, yet expose the subtyping relationships that users would expect in the problem domain.
The only drawback I've found to this approach so far is that naked (non-abstract) values in this type system can be compared in ways that programmers don't expect in other languages. For example, you can write code like "fac(x:int):int=>if x='0' then 1 else x*fac(x-1)" where you're comparing x to the character '0' instead of the number 0, and this is a valid comparison which always evaluates to false.
To be sure this isn't taken out of context, I'm only proposing this approach from a high-level language point of view. The programmer would see integers as a subtype of rationals. When the compiler and linker get down to actually translating such code into efficient runnable form, things are implemented quite traditionally, with inlining, boxing, and conversions/coercions to (for example) extend integers into equivalant rational numbers, and so on.
Regarding the array subtyping issue you mention:
I've found practical advantages to exposing both contravariant functions and invariant functions. Invariant functions behave like arrays from a typing perspective: they can only be equal if their domains are equal. Contravariance gives you record-style subtyping and the ability to extend a function "+" from integers to rational numbers while maintaining compatibility (so, for example, you can have "the ring of integers" be a subtype of "the field of rationals" -- great, but contravariance has two big disadvantages:
1. In a type system that admits some comparisons to be undecidable and some decidable, any values of record or array type built from contravariant functions much necessarily be incomparable, because the domain of any value not known precisely at compile time may be extended in a way that allows returning incomparable values. So if you build arrays out of contravariant functions, it's impossible to compare them.
2. From a user's perspective, having {1,2,"x",9.5} <: {1,2} is not something one expects of arrays.
Invariant functions allow you to write arrays which are guaranteed to be comparable given comparable range and comparable finite domain, and make {1,2,"x",9.6} distinct from {1,2}.
Taking this further, you can then have invariant functions be subtypes of identically declared contravariant functions, and think of contravariant functions (conceptually) as an infinite union of the identical invariant function with every possible extension of its domain.
Apr 28, 2003 - 18:37 - Re: Python Metaclass Programming
Kimberley:
One researchy attempt along these lines is McAllester's Ontic (http://ttic.uchicago.edu/~dmcallester/ontic-spec.ps).
Unfortunately I've never seen any compilers that apply these concepts to real-world programming languages, but I'm trying to implement one. My experience so far is that it's achievable, but is difficult to nail down all of the proper typechecking rules and to model real-world programming constructs (like inheritance and open-world programming) from such low-level primitives.
Re counting sheep:
Real-world programming is more about sheep than isomorphisms of sheep. Are two sheep the same sheep? Given two sets of sheep, are the sets the same? Given a set of sheep and a sheep, does the sheep belong to the set of sheep? What elements belong to the set of all subsets of the set of all sheep? These are useful concepts for typechecking.
Hiding equality and forcing programmers to deal with explicit isomorphisms is a strictly less useful technique than directly exposing subtyping and type equality.
First, because compilers are capable of pretty impressive verification of subtyping properties, even for really powerful languages (where, though the general problem is undecidable, there is a very expressive decidable subset of the language). Verifying anything related to isomorphisms is far harder. Have you tried doing this for serious programming languages? In a practical implementation, I think you end up with progammers writing a ton of explicit coercions which are claimed to be isomorphisms but in practice are just a bunch of arbitrary functions the programmer hopes are correct isomorphisms.
Second, because most of practical programming practice today requires subtyping. Any serious app written in C++, Java, or C# would become way more verbose if subsumption were explicit, with explicit coercions all over the place.
I mention that my philosophy on this topic is very much a result of studying McAllester's papers in the past year. In (mostly minor) language work I did in the 15 years previous to that, my thinking was very similar to yours, that in the ideal type system, types are an utterly distinct from values, that concepts like type equality don't necessarily follow from value equality, etc.
Since then, I've looked at the basic constructs in McAllester's paper as the basic building blocks of future type systems: a type-forming "the-set-of-all" construct, an element extraction construct "an-element-of", types as values, types-of-types available in a structured way ("a-subset-of"), etc. Sadly the terminology and programming style there tends to obscure the fundamental breakthroughs there.
Re array subtyping:
You had pointed the unnaturalness of "{0..3} < {0..4}" leading to the conclusion that "chararray(4) < chararray(3)". You used it as an example of the arbitrarity of subtyping. An alternative view is that subtyping is fine, but there are really two different kinds of function-like things, one where a->b <: c->d iff b<:d and c<:a, and the other (call it ~>) where a~>b <: c~>d iff b<:d and c=a.
Apr 29, 2003 - 13:35 - Re: Python Metaclass Programming
Frank,
This has been a very useful discussion for my understanding of type system decisions. I'm not making these arguments because I'm confident you're wrong, but because I'm worried that I'm wrong.
I've definitely encountered the broad outlines of the problems you mention, of naked equality (where 3='x' is a valid comparison, a useful degree of typechecking is lost), and of dependent types often being an unnecessarily complex solution to certain problems.
Just a few quick notes:
So you are saying that you want an array of b's indexed by a to be a "function-like thing" of type a~>b?
Yes.
Then you would get no subtyping relationship between chararray(3) and chararray(4) at all.
Right; this one point is certainly sound and not speculative. It turns out this is what you want for arrays, because given an array a~>b where a is a bounded natural number type, one of the fundamental operations you want need is length-extraction. If the domain is invariant, then you can decidably extract the length of any array value.
With this invariant domain approach, you can precisely specify the type of a 7-character array as "(x:nat<7)->char", or an unknown-length array of characters precisely as (exists y:nat).(x:nat<y)->char.
If the domain where contravariant, domain extraction would be more subtle (because the domain could be expanded to be anything), and array comparison would be undecidable.
Continuing this exercise, we have -> for modelling contravariant-domain functions, and ~> for modelling invariant-domain arrays. What is the covariant-domain version of this? From the subtyping rule, it's the dependent-pair type constructor!
How is that an argument for subtyping? Because it eliminates consideration of the "unnaturalness" I mentioned?
This doesn't do much to sell subtyping, it just counters the (true) argument that "modelling arrays as contravariant-domain functions leads to unnatural results" by providing a natural, subtyping-friendly way of modelling arrays.
Of course, as you point out, the big argument against subtyping isn't this, but that subtyping can be regarded as an arbitrary imposition, especially when viewed from a category theory point of view (as opposed to a types-as-sets-of-values point of view).
One final note with regard to subtyping: in a framework like Ontic, it's not really possible to avoid subtyping, because types don't exist as a separate concept, but only as "quoted" (the-set-of-all) nondeterministic collections of potential values. From a categorical/iso point of view, I suppose this is heresy, but it seems the most powerful and natural way of modelling types if one takes a types-as-sets-of-values point of view.
I will definitely try to rekindle this argument in the future when I can present something much more concrete.
May 02, 2003 - 17:50 - Re: Python Metaclass Programming
Still, your representation of arrays seems rather complex. If it were me, working in your system, I would start with the obvious fact that an array of b's of length n should be representable as an n-fold product of b', where b' is isomorphic to b.
In my experimental compiler and type system, I have arrays and n-fold products not only being isomorphic but identical: they are extensional, invariant-domain functions from natural numbers less than some bound to the product's type.
In the case of uniformly typed products, these are simple functions, and in the case of arbitrary products (say int*string*char), they are dependent-typed functions.
The theory of the all this is debatable, but it does work nicely from a practical programming point of view. Here's a quick view of this in my actual syntax (a blend of C constructs with Pascal-style declarations, in other words very non-Haskell/ML style).
The type of functions from integers to integers.
(:int):int
A function from integers to integers.
(x:int)x+3
Same as above, but with an optional type declaration.
(x:int):int=>x+3
The type of functions from natural numbers in the range 0..3 to integers.
(x:nat<4):int
An actual function like the above.
(x:nat<4):int=>x+7
Recursive function.
f(x:int):int=>if x=0 then 1 else x*f(x-1)
The type of arrays of 4 integers. Arrays are like functions, but the domains subtype invariantly, and they evaluated and typed extensionally.
[:nat<4]:int
Alternate syntax for the same thing.
{:int,:int,:int,:int}
You can give the elements of the array names.
{x:int,y:int,z:int,w:int}
A concrete array of 4 integers.
{3,6,9,12}
Equivalant way of expressing the above. Like lambdas, but for arrays.
[x:nat<4]x*3
A non-uniform array type.
{x:int,y:string}
Equivalant way of expressing that. Note type-dependence.
[i:nat<2]if i=0 then :int else :string
Combining some of the above: A function from 2-element arrays of integers to integers.
(x:int,y:int):int=>(x-1)*(y-1)
Arrays can have recursive value dependence. See Hickey's "Very Dependent Types" paper for elaboration. For example, {4,2} is an element of this type and {5,6} is not.
{x:nat,y:nat<x}
Equivalant to the above.
f{:nat,:nat<f[0]}
Equivalant to the above.
f[i:nat<2]if i=0 then :nat else :nat<f[0]
The type of all pythagorean triples.
{x:nat,y:nat,z:nat=sqrt(x*x+y*y)}
My thinking here is that, without diverging too much from mainstream applications programming language style, you can gain remarkably more statically typecheckable expressiveness by unifying some of the constructs. Java/C# notions of functions, arrays, structs, and open-world objects are all represented as different sorts of functions differing in domain variance, extensionality, and dependence.
The New C++: Trip Report, October 2002
May 09, 2003 - 16:15 - Re: The New C++: Trip Report, October 2002
int f(int a,char b,bool c[10]);
struct {int a; char b; bool c[10];}
template<int a,char b,bool c[10]> class c;
tuple<int,char,bool[10]>
void g(void) {int a; char b; bool c[10]; ..};
Spot the five completely different sets of grammar and semantics to represent the notion of "an integer, a character, and 10 booleans" in different contexts.
Using Memory Errors to Attack a Virtual Machine
May 19, 2003 - 14:30 - Re: Using Memory Errors to Attack a Virtual Machine
Using Baseball Bats to Attack a Virtual Machine.
Don Box on C# generics vs. C++ generics
May 13, 2003 - 12:53 - Re: Don Box on C# generics vs. C++ generics
Type-safe genericity doesn't have to be terribly hard. If you look at it from the C# or Java point of view, it's just a matter of disallowing any operations which aren't typesafe. The problem with this simplistic approach is that it disallows many common things that programmers want to do, such as passing an array of integers to a function expecting an array of objects -- which at first glance seems reasonable because all integers are objects. But the type of mutable integer cells is not a subtype of the type of mutable object cells, so such an "upcast" isn't safe.
So you have the following options:
- Keep the C#/Java style type system and disallow anything that's not statically sound. Safe, but inconvenient.
- Allow such unsafe casts, and check for failure at runtime. Convenient, but sometimes goes horribly wrong.
- Implement all of the language facilities you need to combine the best of both worlds, static safety and convenience. This is more difficult than it first seems, especially when you try to implement such things with performance similar to C#/Java. You need a type system and syntax that lets you express the (typewise) different concepts of "arrays of integers", "mutable arrays of integers", "arrays of mutable integers", "mutable arrays of integers", and all of their subtyping rules and fine structure. This brings up the issues of structural vs nominal identity, etc. You have to go way off the beaten C#/Java path to solve these problems.
May 19, 2003 - 19:40 - Re: Don Box on C# generics vs. C++ generics
Some might argue that this is just one more argument against mutable values...
Mutable values can be made to work safely, but a safe implementation doesn't look quite like C++, C#, or Java. The pure functional view of mutables is that the "heap" is passed in and out of every function; all code is executed eagerly in a well-defined order (i.e. normal order evaluation); there is a "ptr(t)" type constuctor describing the type of pointers to items of type t; the only heap operations are "new pointer from value", "read from pointer", "write to pointer" and in the presence of subtyping, for any pair of unequal types t and u, the types ptr(t) and ptr(u) are disjoint, even when t and u are in a subtyping relationship.
In this framework, the mysterious notions of lvalues, rvalues, mutable variables, null pointers, value identity vs referential identity, and mutable array coercions all go away.
Obviously one wouldn't implement a real runtime by passing a heap in and out of every function, but if you take this conceptual view of mutability, then it's easy to understand exactly how C# and Java differ from this model and how doing so breaks static type safety.
It's an interesting project to start with a safe system like the above and, purely with syntactic sugar and typesafe program transformations, how close you can get to the "look and feel" and performance of Java and C# mutability. It turns out you can get pretty close, and that the things you can't quite mimmic turn out to be unsafe or not well-defined.
Sun, Zend push scripting for Java
Jun 13, 2003 - 22:28 - Re: Sun, Zend push scripting for Java
Are we witnessing convergence in the field of programming languages?
You can compile any language down to 8086 assembler code. When Intel released the 8086, did we witness convergence in the field of programming languages? :-)
What we're seeing is just Turing-completeness and a reasonable enough design so that translated code didn't suffer horrible performance penalties. This is all that's really necessary for all existing languages to target your platform.
This is actually a good thing for language diversity. If you're a language creator, you'll likely find it far easier to target these runtime environments than to generate assembly code, and you'll likely see far better performance from these runtimes (with their JIT compilation) than if you write your own interpretter.
But let's not read more into the CLR and JVM than is there. They're just targets for which your compiler can generate object code, as are X86 and PowerPC.
Point Free Style
Jun 19, 2003 - 16:08 - Re: Point Free Style
I've always found it very hard to read and write point-free code. I wonder if this is just a matter of practice. Anyone here feel comfortable with the point-free style?
Partial Types in C#
Jul 11, 2003 - 10:47 - Re: Partial Types in C#
Gee, everyone hates the way our UI builder works by auto-generating a pile of messy code whenever you move controls around a form (unlike Visual Basic, and Delphi, and ...). I know, instead of fixing the problem, let's further obfuscate it by adding a bizarre new language feature that breaks well-known principles of encapsulation!
Theorems for free!
Aug 01, 2003 - 14:23 - Re: Theorems for free!
Keep in mind that this "free theorem" business isn't universal, but depends on the particular type theory used. In particular, that the type theory doesn't admit types as terms and doesn't provide any operations on types besides sum, product, and exponential. Haskell is one of the few practical language where it holds.
For example, if your type theory exposed a function inhabited :: type -> bool which returned true if the type were inhabited and false if empty, then not every universal quantification would have such a free theorem. For example, forall a.[a]->[bool] would exist but wouldn't be invertible.
In my view, the reorganization of type theory into category theory has been very harmful to the subject. It has reinforced the separation of types and terms, leaving few researchers to ponder systems combining the two or to even consider the possibility that such systems might be valid and useful. Why? Category theory's focus on isomorphism (instead of equality) doesn't scale. John Baez's writings on things such as "the category of all categories" and "the category of all categories of all ..." illustrate the near-hopeless complexity of isomorphisms-of-categories perfectly. As a result, simple notions like "the type of the type of all inhabited subtypes of the type of natural numbers" seem incomprehensibly difficult in category theory and hence in type theory.
Category theory brought useful advances to topology. No surprise here, because topology is precisely the study of isomorphisms between geometric objects. Type theory, as it applies to any sort of actual programming system, just needs to answer the question, "which values are allowed in this context?" and there is no benefit to dragging isomorphism into this picture.
The state of type theory today seems to me to like a set theory where most refuse to acknowledge the possibility that sets might themselves contain sets, or the existence of functions from sets to sets, from values to sets, or from sets to values. Sure, the current approach brings some simplifications to reasoning about types, but only at the expense of a wealth of new tools for expressing, refining, and reasoning about types.
This conclude's today's rant. :-)
Udell: Symbol grounding, XML, and RDF
Aug 11, 2003 - 17:07 - Re: Udell: Symbol grounding, XML, and RDF
Does anyone else see XML as an overcomplicated solution the meager problem of serializing data in and out of text files?
Closures and mutability
Aug 18, 2003 - 18:43 - Re: Closures and mutability
I absolutely agree with Erik, though I wouldn't have seen it this way without having implemented closures every possible wrong way and finally come around to this way of thinking.
It's quite a change from the C mindset and its simple "data structures <-> blocks of memory" view of the world, but there is great value to forcing all variables in a language to be constant, and expose mutability purely through typed references, solely through the new_ref, read_ref, and write_ref constructs.
Aug 19, 2003 - 13:01 - Re: Closures and mutability
The advantage is conceptual. As you mention, you can always losslessly translate from one representation to the other using boxing. The value I've found in encapsulating all mutability entirely in ref<t> / new_ref / read_ref / write_ref is that it's much easier to combine imperative programming and functional programming into a single framework, and to analyze it using modern type theories.
In the "everything is implicitly mutable unless you declare it constant" view, you have the following subtype relationships:
- int is not a subtype of object. - int is not a subtype of const int. - const int is not a subtype of int. - const int is a subtype of const object.
- is {int,const string} a subtype of const {int,object}? - the answer is not obvious.
In the explicit-references view, you have:
- int is a subtype of object. - int is not a subtype of ref<int>. - int is not a subtype of ref<object>. - ref<int> is not a subtype of ref<object>.
- is ref<{ref<int>,string}> a subtype of {ref<int>,ref<object>}? - obviously not.
The type relationships are clearer in the explicit-references view: all basic data types have the obvious subtyping relationships to each other; but all reference types are disjoint from each other and all other types.
In the everything-is-mutable-by-default view, the relationships are obscured. It becomes quite hard, when looking at two complex data structures containing a mix of constant and mutable fields, to see what sort of structural subtype relationship exists between them.
Aug 19, 2003 - 19:26 - Re: Closures and mutability
Good points. The ideal const/mutable distinction really depends on the typical programming language you're working in.
C, C++, Java, Python, and C# exist in a nice local optima where you can write and modify imperative code very productively without putting a lot of thought into mutable vs constant values. For this style of language, I wouldn't advocate Erik Meijer's approach, because it adds verbosity in order to make distinctions that typical code seldom requires.
Once you generalize your language's feature set a bit, constant vs mutable issues become much more central. The key areas are:
- If a language supports structural subtyping, then the lack of a subtype relationship between int/object and int / const int become very painful. This isn't noticed in the above languages because their type systems (i.e. C structs, C++/Java classes) are purely nominal.
- If a language supports dependent types, then ad-hoc mutability interacts very poorly. Consider something like {t:type:=int; x:t:=7; t:=string;}. In general, this is why features like C++ templates and the upcoming C# generics are purely static. In a fully general dependent-typed language like Cayenne with a clear separation between functional and imperative types, you don't need templates/generics as a feature, because you can simply write a function that takes some type parameters, that returns a function or data type as a result.
- If you want a language with a clean functional subset, in order to take advantage of compile-time term evaluation (important once you have dependent types), implicit parallelism, or lazy evaluation, then it's vital for your language to be able to distinguish (both for types and values) imperative code from functional code, and to default to functional for common constructs.
- In the presence of type deduction, type systems that think in terms of variables being const or mutable behave strangely, because for each term you need to decide whether you mean the actual term, or the implicitly reference-read-coerced version of it, and these end up in a tangled set of interdependencies. When every reference-read is explicit, and every "mutable term" is instead a constant term of reference type, there is no ambiguity.
Aug 23, 2003 - 16:47 - Re: Closures and mutability
Vesa:
In the new language I've been building, there are two main areas where annotations are needed, compared to traditional C-family languages: that a variable is mutable (really that it's a reference to a mutable heap-allocated cell of a particular type), and that a function may have side effects (including heap writing and IO operations).
This is all set up such that, given a piece of simple source code mixing imperative and functional features, it would be relatively straightforward to translate it to either C (where everything is imperative) or Haskell + IO + ST (where everything is functional except the behind-the-scenes work performed by IO).
This works well in my context, but the typical programming style ends up being quite different than in C-family languages. I definitely wouldn't advocate this just as an extension to C. But it can be attractive as you try to merge functional, imperative, and higher-level type system concepts.
Paul:
Given something like C#, Java, or UnrealScript as a starting point, this protocol/auditors approach is a natural and attractive direction.
I actually spent a year of occasional late nights and weekends implementing something along these lines. It seemed really elegant at the time. There were all kinds of attributes you could apply to variables and functions to control their scope and interactions. For one example, there were "visitor references" to objects (as opposed to ordinary references) where you could access variables and functions through the reference, but couldn't capture it or assign it to a non-local variable. There were all kinds of access overrides allow things like inner classes whose variables could be accessed by outer subclasses but not the external environment in general.
As its complexity grew, though, it started to feel very arbitrary. I was also learning about type theory: dependent types, monads, the Curry-Howard isomorphism, and so on, and was starting to feel that these problems could be solved in a more general way. So I started over and have been proceeding on the basis of a dependent-typed functional language with an imperative superset, with various combinations of nominal and structural typing supporting open- and closed-world programming. Highfalutin features aside, though, it's pretty similar in syntax to the C/Pascal family, as opposed to the LISP or ML/Haskell families.
But that's been such a big metaproject that I haven't gotten around to implementing any advanced scoping and security constructs. :-)
General opinions:
- I would like to see issues like "is this variable mutable" addressed at the type system level, independent of security.
- I want any language-level security features to be statically verifyable, not relying on runtime checks. The one exception to this is computing resources (stack size, heap allocations, infinite loops) which unfortunately require a bunch of twiddle factors, because type-based verification of resource bounds is currently intractable for real-world programs.
- I would only hope for language-level security in languages based on intermediate code, run with either JIT or a VM, with something like a bytecode verifier or proof carrying code checker standing between the code and the machine.
- I like the idea of code signing as an optional, security-enabling feature. For example, I would like to be able to release several (Java-style) packages which work independently, but are granted special access privelages with respect to each other.
- I am becoming a big fan of region typing (the ability to have multiple heaps, and segregate access to heaps), as a way of enabling patterns like purely functional functions which can use a local heap internally, but are still guaranteed to have no observable side-effects; and big complex imperative functions which can be executed in parallel because they're guaranteed to be operating on different heaps. (Example: two game levels running on a game server.)
- I think type system based security guarantees can solve all problems with distributed untrusted code.
The question of distributed mutable objects is a different matter. Two observations: Implementing this, even naively and arbitrarily (as Unreal does for game objects), can lead to a really expressive and powerful way of programming. But doing this in a theoretically sound and fully general way seems much tricker, bringing up problems such as transactioning and distributed garbage collection. I can't envision what a general solution to this problem would look like, but it's certainly a great topic for active R&D.
Whither Self
Sep 14, 2003 - 12:01 - Re: Wither Self
"Self" or "this" only work nicely in languages that don't support nesting in a fully general way. For example, in C++, "this" is unambiguous because there is at most one class lexically enclosing the current block of code. But in Java, you have to be very careful to modify occurances of "this" as you move code between inner classes and outer classes.
When defining functions that may include recursive lexically enclosed lambda expressions, the name you associate with a function ("Fac" in the author's example) provides an unambiguous way to refer to that function, while "self" would beg the question: which self? The nearest one lexically, or the outermost one? Obviously, a language could pick one convention or another, but then the meaning of "self" would change if you, i.e., move an expression in or out of a given lambda.
The same problem exists for classes and records when you support nesting them (for example, with inner class or virtual class semantics): It's no longer obvious which "this" you're referring to. In languages that support a "thistype" construct, the same problem exists there.
You could solve this by making "self" constructs indexed: instead of "self", you only have "self[0]", "self[1]", etc. But then you have reinvented de Bruijn indices and, though they work well for procedural manipulation of terms, they aren't very human-friendly.
If one's language supports nesting in a fully general way, I think it is worth considering eliminating any "this" or "self" as standalone constructs, and instead require that any such things be referred to by name.
In the case of functions, this is unambiguous (mod shadowing of variable names): "f" refers to the enclosing function f.
In the case of classes, it's trickier because, depending on your language, there may be up to three things that a name refers to: the current instance (like C++ "this"), the current instance's type (like some languages' "thistype"), and the literal type (like referring to class name directly in C++). Possible syntaxes for these are "c.this", "c.thistype", and "c".
CLR Exception Model
Oct 04, 2003 - 14:33 - Re: CLR Exception Model
Ehud,
Would you consider it uncontroversial for a language in the general spirit of C++/C#/Java to expose exceptions as follows?
- There are two kinds of functions, clearly distinguishable by syntax: imperative functions (which may have side effects and other referentially non-transparent behaviour) and pure functions (referentially transparent).
- Pure functions default to being able to throw no exceptions. To widen the set of exceptions throwable by a pure function, you need to add an extra keyword, like "throws(int,string)", or (in the most general case) "throws(any)".
- Imperative functions default to being able to throw exceptions of all types. To narrow the set of exceptions throwable by an imperative function, you need to add an extra keyword, such as "throws(int,string)" or (in the most narrow case) "throws(none)".
- A type of functions f may only be a subtype of a type of functions g if f throws the a subset of the exceptions of g.
My thinking is that this approach satisfies the largest possible audience, giving you the capability of expressing any possible exception widening / narrowing behaviour, but defaulting to what is typically wanted: pure functions where exception support is narrow and may only be widened explicitly, and imperative functions where exception support is universal and only be narrowed explicitly.
Thoughts?
Oct 04, 2003 - 22:07 - Re: CLR Exception Model
Daniel,
Totally agreed. Type unions especially are an extremely valuable language feature, and aren't too hard to implement. It's amazing how frequently C++/Java/C# programmers create a subclass hierarchy when what they really want to define is a disjoint union, or a type union.
Regarding the distinction between imperative functions and pure functions: programming with local exceptions and local heaps can be done inside a pure function, and the typechecker can assure that no imperative effects can escape out of that pure function. The Haskell State monad provides a good example of how this can be implemented (though there are other ways, and I use an approach that looks more like Pascal/Java/C# syntactically).
The exception handling details are straightforward (i.e. a pure function must be statically provable to catch all possible exceptions its body is capable of throwing before it exits).
The heap details are more complex, and sometimes end up limiting what the compiler can recognize as being locally imperative within a pure function. The most general solution is region typing, but its syntactic overhead is big, so right now I feel OK about accepting the pure/imperative dichotomy and the limitations that go with it.
In general, the set of exceptions throwable by a function, and its heap accessibility (pure or imperative) are things one wants to parameterize over.
For example, you want to be able to write a 'map' function that is pure when its input function is pure, but imperative when its input function is imperative.
This has a nice solution if you embrace the most general subtyping relationship possible (f:a->b <: g:c->d iff b<:d and c<:a and f's set of effects (heap accessibility, set of throwable exceptions) is a subset of g's set of effects. In addition to being able to declare exceptions as "throws(int|string)", you can say "throws(int|f.exceptions)" to say that you can throw integers or any exceptions throwable by f.
Do you think that capability would solve the problems you've pointed out with the pure/impure dichotomy? I haven't run into those myself (yet), but in my work so far I've had somewhat simplistic boundaries between functional and imperative code.
Historic Documents
Oct 12, 2003 - 15:13 - Re: Historic Documents
There are a lot of great paperes in the "collected manuscripts of Edsger W. Dijkstra" section. It's amazing how much of the stuff he said relative to computing in the days of punchcard readers is still true, yet insufficiently recognized, now.
Longhorn Markup Language (code-named "XAML") Overview
Oct 28, 2003 - 14:13 - Re: Longhorn Markup Language (code-named
C# is a wonderful programming language with support for meta-data attributes and reflection, great features for associating properties with UI controls. Yet instead of exploiting this, they define a new dialect of XML, a very programmer-unfriendly language that maps poorly onto the C#/.NET type system.
Worse, to do any actual event handling, they actually embed C# into XML, resulting in code like their example:
<Canvas ID="root"
xmlns="http://schemas.microsoft.com/2003/xaml"
xmlns:def="Definition">
<Button ID="button1" Click="Clicked">Click Me!</Button>
<def:Code>
<![CDATA void Clicked(object sender, ClickEventArgs args)
{
button1.Content = "Hello World";
}
]]>
</def:Code>
</Canvas>
How horrid.
Finalization (CLR)
Feb 21, 2004 - 13:02 - Re: Finalization (CLR)
Supporting finalization in a garbage-collected environment leads to a significant number of issues with performance, non-determinism, and collector inefficiency.
Rather than going down that path, I would much prefer to see significant R&D effort put into the development of imperative-style data structures and algorithms in a way that doesn't require finalization. I have experimented with some of these issues myself.
Here are some thoughts.
File handles: If you are going to just open a file, do a bunch of operations, and close it, you could use a monad encapsulation similar to the way Haskell's State works, enabling opening the file, performing operations, and closing it, in a way that's guaranteed to result in the file being closed when exiting the lexical scope in which it was created.
But much more interesting is the possibility of using memory mapped files, and treating file-opening calls as returning a (possibly mutable) array of bytes which you can operate on. Then the memory mapped file would be closed when the array is eventually garbage collected. The big problem here is that in current file-mapping implementations, holding a file handle isn't referentially transparent: another process can observe whether the file is locked. This could be remedied by adding support in the OS for "tear-off" file mappings which memory map the file and then make it appear to the outside world that the file is closed (for example, so another process could modify it or delete it without a sharing violation). In this case, the OS paging mechanism would need to implement a copy-on-write-or-delete scheme for torn-off file handles, so the referential transparency requirements of both the file-using process and the outside world are satisfied.
Network sockets: Sockets for protocols like TCP are not amenable to finalization-free garbage collectors, because the process of closing a socket has non-referentially-transparent IO effects that ought not occur nondeterministically. But a language could expose higher level protocols that are more suitable, for example creating the illusion of monadic persistent connections between processes.
Note that all of these considerations aren't terribly relevant in a runtime like .NET which doesn't have a first-class concept of referential transparency. There, any variable in any object might change at any time because other functions to which you have passed the object could at any time modify any variable in another thread or through reflection. But in an environment with a functional subset or just a means of specifying and verifying limits on the scope of effects of computations, it becomes very important to avoid observable side-effects of garbage collection.
Python 'for' as Scheme 'let'
Feb 23, 2004 - 13:46 - Re: Python 'for' as Scheme 'let'
I've been trying to sort out these similarities too. There is also a notion of a Monad with a "cut" operator similar to the Prolog cut operation discarding prior choice points; something along these lines may be valuable in modelling a array monad analogy to the "break" command in C for loops. (The "continue" command is already modelled by the array monad's zero.)
The presence of operations with side effects in things like list comprehensions complicates their interpretation as monads. One approach I find promising is to model all imperative operations in a program literally as IO monads, and to have a syntactic translation of things like for loops or list comprehensions whose contents have side effects into a simple composition of the Array or Maybe monad with the IO monad. Obviously this translation will only work for a predetermined set of monads (because not every monad can be composed with IO so as to yield a true monad as a result), but it would be nice to be able to model all imperative constructs and looping constructs with a mainstream-style syntax (as opposed to Haskell syntax) with the monad translation being automatic.
Note that in a suitable type system, you can construct a "TypeMonad" whose map functor takes the (exact) image of a supplied function over a supplied type, unit generates a singleton type containing just the supplied element, join is type union, and zero is the empty type.
The really neat thing is that comprehensions in the TypeMonad correspond to constructing new types elementwise from existing types, i.e. for(a:at, b:bt)[a,b] is the type of pairs containing elements from the types at and bt. And in the IdentityMonad, comprehensions correspond to lets: for(a:x, b:y)a+b is just x+y. So, many of the leading edge syntactic constructs in languages can be represented as comprehensions over a suitable monad.
Feb 23, 2004 - 14:00 - Re: Python 'for' as Scheme 'let'
Pardon the spam, but here are some more notes:
The TypeMonad described above also can be extended with a new "meet" operating (taking typewise intersection) and "infinity" object ("the type of all types" if it exists and is nonparadoxical in a given type system) as its idempotnent. This structure can then be seen as a lattice or a topos or a model of set theory depending on what you want from it. However, TypeMonad can't be sensibly composed with IO, because elements in a set are inherently ordered, in contrast to elements of a list of array.
Finally, there is a concept of a monad with a fixed-point operation (see http://www.cse.ogi.edu/PacSoft/projects/rmb/mdoTalk.pdf), which leads to the interesting possibility of accessing the previous results of a list comprehension while constructing subsequent elements.
Nullable Types
May 31, 2004 - 18:15 - Re: Nullable Types
The existance of the "null pointer" is a language design mistake dating back three decades. Instead of keeping the notions of "pointers to elements of type t" and "type t extended with an additional 'null' value" separate, C combined the two orthogonal notions into a single type constructor: pointers that might be 'null'. C++ confused things further by defining references in a way that makes it look like they can't be null, though they actually can be!
There is a much cleaner way of implementing pointers:
- For every type t, the type ^t represents "the type of pointers to elements of type t". These include only the actual pointers which can be safely dereferenced.
- For every type t, the type ?t represents "either an element of t or the special 'null' value".
- Then ?^t is the equivalant of "pointers that may be null". Define a shortcut syntax for this: *t = ?^t, and there! You have backwards compatibility and a more sensible type theory.
Unfortunately, C# confuses pointers and objects and mutability issues in so many other ways that such implementing this orthogonality is probably not worthwhile.
The view from the left
Jun 12, 2004 - 13:58 - Re: The view from the left
Frank, is there a known way of expressing non-trivial type isomorphisms in programming languages, where the compiler can either auto-derive the isomorphism itself, or let the user express it in such a way that the compiler can verify its validity? This seems like a necessity for what you're looking to do with Arrow.
Do you have any concrete examples of writing programs "up to isomorphism"? For example, a monad's unit is only determinable from the monad's functor up to unique isomorphism. For example, with the Haskell list monad, three potential units are (x::t)->[] and (x::t)->(x::[]) and (x::t)->(x::x::[]). Can you program with lists in a way that is generic with respect to list unit? Does the trivial case of [] tell us anything new about list programming? Does the fact that "double-consed" lists can't be exactly typed in most languages prevent this approach from working?
Some concrete examples would be very valuable.
Jun 17, 2004 - 18:52 - Re: The view from the left
Frank,
Thanks for the info! And pardon my butchering of Haskell syntax in those examples. What I really meant is that from the array functor (in Haskell and elsewhere), one can derive several different monads, each with a different unit:
unit :: t->[t] unit1 x = [x] -- The ordinary unit unit2 x = [] -- A trivial unit unit3 x = x:(x:[]) -- The "double consed" unit
And these different monads, except the trivial one, are isomorphic. So in order to deal with types "up to isomorphism", it seems like one has to be able to deal with functors "up to natural isomorphism", and so on. Sounds tricky, but valuable.
Error handling strategies
Aug 20, 2004 - 03:18 - Failure as a first-class concept
It is interesting for languages to treat failure as a first-class language concept, requiring explicit handling of any failure conditions that may occur. Java's (controversial) exception declarations provide one approach to this. Icon provides another which is very expressive but quite different.
I'm not a fan of Icon's multi-valued semantics (which wrap a nested list comprehension around the entire program!) But I like the basic style it encourages:
if(f:OpenFile("Log.txt"))
Write(f,"Some Text")
else
Print("Failed to open file")
Whereas, on the other hand, writing:
f:=OpenFile("Log.txt")
Write(f,"Some Text")
Is disallowed by the compiler because the "OpenFile" expression may fail, but isn't explicitly handled.
This technique can be combined with dependency and sequencing to allow some very expressive programming interesting idioms, such as:
if(c:ParseChar(),c>='A',c
Which not only tests some conditions, but also creates variable bindings. For example, you can use this for type-aware casting to work in an environment with compile-time array bounds checking:
PrintElement(a:[]int,b:int):= if(i:nat <a len="b)" else
In this example, we're casting the integer "b" to "the type of natural numbers less than the length of the array a", and binding a variable to the result which can then be safely used to index into the array.
A final example uses "?t" syntax (similar to Haskell "Maybe t") for the type of an optional element of type t, and "^t" for "Pointers to non-null elements of type t":
List:=type{x:int,next:^List}
Display(a:List):void
Print("Value is ",a.x)
if(n:a.next)
Display(n^)
This provides a clean type-theoretic solution to earlier discussion on "nullable pointers", by orthogonalizing the concepts of pointers and optional values, and assuring that in any context where an optional value is used, it must be explicitly tested with a conditional to guarantee that no runtime failure can occur.
I have implemented this on a fairly large scale in an experimental language and feel the results are promising. I'd love to hear feedback on this approach, and also whether readers find this C-Pascal-Python hybrid syntax attractive or revulsive.
Aug 20, 2004 - 05:06 - Wheat
Wheat's approach sounds very useful and implementable. I evaluate expressions that may contain failure by (conceptually) evaluating them as if they were comprehensions in something like Haskell's "Maybe" monad. It would be straightforward to combine this with error codes by evaluating them in a order-dependent monad that looks something like "Perhaps t = Really t | Error string".
I have a similar operator syntax "a||b" which evaluates to a if a doesn't fail, and evaluates to b otherwise. This ends up working almost exactly like C's || operator (including evaluation order for imperative terms), but it propagates the original terms (rather than just a boolean true/false value). Because of the way the failure propagates, a||b is literally translated to "if(x:=a)x else b".
Using these failure semantics, it so happens that no explicit boolean operators are needed: the array-former (a,b) acts as a boolean conjunction, this (a||b) operates as a disjunction, and "!a" negation translates to "if(x:=a) fail else {}".
The operators support variable bindings, so an expression of the form "if(x:=a||b,y:=c).." executes the ".." part if ((a doesn't fail or b doesn't fail) and c doesn't fail), sequences any imperative effects there, and lets you access "x" and "y" with knowledge of their type. For example, if a and b evaluate to integers (or fail), you can use x as if it's an integer.
Implementing Declarative Parallel Bottom-Avoiding Choice
Aug 26, 2004 - 00:06 - Thoughts
In this area, I think the research community is missing the boat in some important ways.
First of all, with "amb", there are really two kinds of bottom that really ought to be distinct: the notion of "no values", and the notion of divergence due to a never-ending computation. Call the former "nothing" and the later "bot" (Haskell folks write _|_ for bot.)
Computing with "nothing" is very simple and well-founded. "nothing amb x" and "x amb nothing" are equivalant to "x". McAllester's Ontic language takes the amb operator a lot further than McCarthy had original envisioned, and enables one to bundle up collections of amb values into set-like types. Ontic's syntax for this is "the-set-of-all x".
This "the-set-of-all x" construct is the microscope under which "nothing" and "bot" are revealed to be distinct concepts. In a type containing a bunch of amb values, some of which may be divergent, it's impossible to perform any useful operations except for positively verifying that an element is a member. Membership can't be refuted, because that requires executing a divergent computation. The only way bottom-avoiding choice can be made usable is with a Prolog-style Cut operator, whose results are nondeterministic and likely to be unpredictable and weird in practice.
Thus I'm a big believer in amb as a fundamental programming construct, and "nothing"-avoiding choice is trivial in this system, but "bot"-avoiding choice seems an impractical and unhelpful tool in language design.
Note on the poster's comment on amb breaking referential transparency: When amb is implemented correctly (with proper sharing of choice points), it's referentially transparent.
For example, (\x.x+x)(3 amb 5) beta-converts to let x=(3 amb 5) in x+x, which reduces to (let x=3 in x+x) amb (let x=5 in x+x), which reduces to 6 amb 10.
This can be summarized as "only single-valued terms evaluate to values; zero-valued or multi-valued terms must be treated as references to a single, shared choice point".
Another way of visualizing terms like (\x.x+x)(3 amb 5) is to translate them to Haskell comprehensions: [x+x | x <- [3,5]].
Another example is "let x=nothing in 7" must evaluate to nothing, not 7. This is obvious from the comprehension [7 | x<-[]]. At first, this seems counterintuitive because we tend to think of any expression you can write as evaluating to a value, whereas with "amb", only single-valued terms evaluate to values; the ordinary substitution rules don't apply where multiple values are present.
Aug 26, 2004 - 16:29 - List Monad vs Amb
The only differences between a list comprehension and a similar computation involving amb are that list comprehensions are explicitly ordered, and duplicate items remain distinct. Amb values are set-like and don't have any inherent order or multiplicity to them; (2 amb 3) is equivalant to (3 anb 2) and (2 amb 2) is equivalant to 2.
If you ignore the underlying order and multiplicity of lists, you can use the list monad to precisely model the behaviour of amb. You can actually define a "set monad" this way. For example, the set monad's join is the union (of a set of sets); its zero is the empty set; its unit forms singleton sets; its plus takes the union of a pair of sets; it's map takes the image of a given function under a given set...each of the monad laws in the set monad happens to be a ZFC axiom.
A Deeper Look At Metafunctions
Sep 03, 2004 - 17:20 - Templates = Lame Dependent Types
Isn't it somewhat silly to embed a limited language with bizarre syntax into existing languages for the purpose of executing code at compile time? Dependent types and compile-time partial evaluation provide a much more general and flexible means for doing this sort of thing.
Combining lazy and eager evaluation of terms
Sep 07, 2004 - 18:20 - Y, etc.
Dominic: That approach to Y works under lenient evaluation, and is actually the literal translation I use to evaluate recursive terms.
The only terms that behave differently between lenient and lazy schemes are terms with subterms which would not terminate if evaluated, such as "let x=x in 7". In a normal-order lazy language, such such subterms only cause divergence if their evaluation is actually demanded by the program's final result. In a lenient language, all subterms are always evaluated to ground values "eventually", even when their values are not demanded. So, this example evaluates to 7 in a lazy language, and diverges in a lenient one.
The "eventually" part is what enables lenient evaluation to succeed for more programs than strict evaluation. For example in "x=Pair(3,First(x))", the First(x) term may begin evaluation before x has been fully reduced to a ground value.
David: I'm not sure if pH (Parallel Haskell) uses a true lenient evaluation scheme like Id90. The reference I have on it ("Implicit Parallel Programming in pH", a book by Nikhil & Arvind) is somewhat unclear on this.
Apple Flunks First Grade Math
Sep 10, 2004 - 23:57 - rational bignums as the default
Couldn't agree more! Floating-point math is a useful but very quirky engineering tool and ought not be in such plain view of ordinary users.
I'd definitely like to see, in future languages, data types for integers, natural numbers, bounded natural numbers, and rational numbers that are in the obvious subtype relationship: 0..0 <: 0..1 <: 0..n <: nat <: int <: rat -- and that obey all of the laws of arithmetic. You know, those laws the ancient Greeks invented a few millenia ago, not the fixed-width twos-complement approximation that CPU's use internally.
My experience is that this can be done without significant performance compromises. For example, in Unreal, about 95% of the occurances of integer math are used for indexing into arrays; the bounded natural number types can always be safely used there and fit into traditional operations. The key realization here is that any number that can safely index into an array that exists in memory can be stuffed losslessly into the machine word size without sacrificing performance in all reasonable cases where real bignums aren't required. This is a decision a compiler ought to make rather than forcing programmers to decide, as C# does, "Which one of the language's 12(!) built-in numeric data types do I want to use in THIS situation?"
Generics for the masses
Sep 14, 2004 - 02:05 - There must be a better way
It feels like there must be a better way to do this. I've been thinking about this for a while but a comprehensive solution isn't clear. But a few things do seem obvious:
First of all, function-overloading provides a more straightforward means of expressing this than typeclasses. This is even more so when one may abstract over the domain of an overloaded function. For example, given a reference "f" to a possibly-overloaded function, you want to be able to extract the type of f's domain using a syntax like "f.dom". This way, for a contrived example, you can write a generic function to take the sum of certain kinds of elements over certain data structures:
// The sum of a single integer is that integer. sum(x:int):=x // The sum of a character is zero. sum(x:char):=0 // To generate the sum of a list of // elements compatible with // summation, we add up the sum of // each element. sum(x:list(sum.dom)):= if(x=nil) 0 else sum(x.head)+sum(x.tail)
This way, we can say sum({})=0, sum(1,2,3)=6, sum( {3,4}, {5,{}, 7}, "Hello" ) = 19, etc. (Here, strings are arrays of characters.) A more useful example would be a generic framework for printing elements of generic types, including both values and containers like arrays and finite sets.
Second, this approach only works well in a closed-world environment. For example, to use a module containing generic printing functions for certain data types, and to use it in combination with your own generic printing functions, you would need to pass your own expanded printing functions in to the container's printing functions, so that it may recursively call both your sum functions and its own.
It appears that Haskell's type classes solve this problem by automatically generating implicit parameters for typeclass instances at their use site. Thus, identical typeclass invocations on identical data types may behave differently depending on their local context. And this is often desirable! This phenomenon is described in http://www.mail-archive.com/haskell@haskell.org/msg15080.html
I think an optimal solution (given the requirements of a strongly-typed language, etc) is a combination of two orthogonal features: support for general overloading of functions (though perhaps with restrictions, like: the domain of the overloads must be disjoint), and support for lexically-scoped implicit parameters. Overloading is a convenience that avoids the need for all the manual plumbing in this paper, while implicit parameters would make generic types work in an open-world environment. Given these two features, plus dependent types, I don't think typeclasses would be needed in a language at all.
Sep 16, 2004 - 01:44 - Re: There must be a better way
This sounds promising. I'd love to see any references you have on this approach. For example, is "self" context-independent or does it need to be qualified when nested several levels deep in definitions where its meaning is ambiguous? (Or am I misunderstanding?)
That monad example will fail to typecheck because, though ">>=" is overloaded, "print 1 >>= _" is of type "(t->IO t)->IO t", and is incompatible with the argument of type "t->Maybe String".
The other issue with replacing typeclasses with overloading and implicit parameters is illustrated by the monad "return" operation. "return" is of type "t->M t", so there is no way to overload it for different monads. If one needs to operate on monads generically, one stills has to bundle all of the operations for each monad into a data structure, in order to pass it around (either explicitly or implicitly). A record or typle would do just fine. This improves expressiveness: whereas there is no way in Haskell to operate on a typeclass instance as a first-class value or to have multiple instances of the same typeclass, explicit typeclasses make this possible. Cayenne (http://www.cs.chalmers.se/~augustss/cayenne/) is a Haskell-style language that supports this.
This highlights the strange multi-faceted role of typeclasses as a means of overloading by type, supporting implicit parameters, and bundling up of related overloaded things into data structures.
Amazon Associates (+ other advertising)
Sep 16, 2004 - 01:47 - Good idea
I probably buy a book mentioned on LTU every odd-numbered month, so you'd make a few bucks a year from me...
History: Array languages
Sep 16, 2004 - 02:05 - Why not more array languages?
Some thoughts on why array languages aren't more prevalent.
1. Most major programs that make significant use of arrays (and would benefit from array constructs there) also make use of significant other features where one wants a more polished, complete, or mainstream language than array languages.
2. Most programs that use arrays don't do anything that's difficult to do with existing languages. For example, if you deal with fixed-size vectors and matrices, then you just write a few functions for array-matrix multiplication and then you build your program on top of that.
3. There are reasonable theoretical arguments against some of the common features of array languages. For example, given a function f:t->u, some array languages automatically let that function operate on arrays, for example as if it's overloaded to f:[t]->[u]=map(f), etc. This is incompatible with Haskell-style polymorphism. For example, given a function reverse:[t]->[t] which reverses an array, and a value a::[[t]], there are two different ways one could interpret reverse(a), and this is confusing. Similar combinatorical ambiguities occur with automatic currying (treating any f:(t,u)->v as f:t->u->v , etc.
multidimensional arrays
Sep 17, 2004 - 22:56 - Arrays are functions?
Andrei, my experimental language treats arrays as functions with a domain of bounded natural numbers. For example, {3,4,5} is a function with a domain of "the type of natural numbers less than 3".
In my experience, array comprehensions solve the problems of list slicing well. I use a BASIC-inspired function mid(xs,i,n) for slicing which takes elements from the array as, and returns a new array containing just the n elements starting at element i. Thus if a horizontal slice of a 2D array is mid(xs,i,n), a vertical slice is for(xs:xss)mid(xs,i,n). This isn't quite as simple as special-case "multidimensional slicing" syntax could be, but it's reasonable and has the advantage of being very general-purpose.
I treat all arrays as 1D but nestable. Thus there is no syntax like a[i,j] but instead a[i][j]. I think this is the "right way", because it makes higher-order operations like mapping and comprehensions simple -- they don't need to be specialized for arrays of different dimensions and choices of what indices are being iterated over.
Fresh O'Caml
Sep 22, 2004 - 02:39 - Generalization?
This seems like an extremely special-case feature to add to a programming language. When presented with something like this, it's useful to consider what would be needed to be able to write this sort of code in a more general language without special-case support.
I see two things here. First, there are the scope properties of the name-binders. These could be expressed with dependent types, such that only scope-consistent expressions typecheck. The resulting code would likely be significantly more verbose, though.
Second, there are the equivalance properties of values involving name binders. There are a number of possibilities for implementing this in a more general programming language. You could write a "mod-alpha-conversion" equivalance test that operates on such terms, treating this as a wider relation than equality (where some alpha-equivalant terms would be considered unequal).
Or, if your language supported finite circular terms (with recursive equality implemented internally via bisimulation), you could recognize all alpha-equivalant terms as equal. I implemented this recently and verified that it works, though I'm not sure it's a good idea. Some pointers on this topic are summarized in http://lists.seas.upenn.edu/pipermail/types-list/2004/000352.html.
Specifically, you could implement binder data types (with alpha-equivalance but not dependent typing) like this:
term:=union
lambda:term
application:type{func:term,parm:term}
reference:term // A name reference, possibly circular.
const:int
myterm:=lambda(application(reference(myterm),const(7)))
Zena Ariola, in http://www.cs.uoregon.edu/~ariola/cycles.ps, describes this representation of name references as the "Bourbaki" style, refering to mathematical texts where graphs of lambda terms were drawn with lines between lambda terms instead of using traditional names for variables.
Alternatively, there is the possibility of using first-class equivalance types, though I can't see how one would implement this in a way that doesn't require programmers to constructively prove horrific things in their code.
Monads in various languages
Oct 02, 2004 - 23:36 - This problem appears to exist
This problem appears to exist generally when translating comprehensions into any strongly-typed language without type deduction. Python generator expressions avoid this by being weakly typed.
Requiring explicit type annotations would be unfortunate. For example, using comprehension syntax for monadic parser combinators is very convenient, but would be about 50% more verbose if type annotations were required.
Are there any languages in which monad operations are convenient and readable without comprehensions? For example, complex Haskell monad operations seem considerably less readable without the helper syntax.
Links (Wadler)
Jan 06, 2005 - 03:49 - A major event
This announcement perhaps marks one of the major events in programming language history, which go on to shape the field for decades to come. It's wonderful to see Philip Wadler at the center of this effort; he is one of the few who have the broad experience and credibility to make it a success.
I certainly hope that the Links syntax will be inspired by the C/Java/Pascal/Python school rather than Haskell, ML, Scheme, or (worse) XML. The later languages are far from mainstream programming practice and would needlessly inhibit the language's adoption by the practical folks who are its target audience.
It is certainly unrealistic to expect that a pure functional language would succeed in the mainstream. A better tradeoff is a language with monadic effects types (but retaining mainstream syntax -- not Haskell "do" notation) which enable general imperative programs to be written naturally, and the scope of their effects clearly delimited so that the language retains a powerful and usable functional core.
Jan 06, 2005 - 23:28 - Why it matters
Perhaps because of the presentation's vagueness, I'm reading into it what I would like it to be.
But I think it's more than that: What Wadler is looking to do is to unify queries, interaction, message passing, and HTML/XML generation into a clean, sound programming environment. These sorts of things are currently handled with a large set of ad hoc imperative languages communicating in poorly-defined ways, leading to a very error-prone system.
Generics: The Importance of Wildcards
May 27, 2005 - 02:03 - Wow
Wow, talk about the blind flaming the blind. The problem with variance in both languages isn't type erasure versus runtime support, or wildcards versus pure generics. Neither writer seems to even understand the problem.
The problem is that both languages only expose mutable arrays. If they exposed read-only arrays (similarly to lists in Haskell), those would be covariant in C# and Java; and if they exposed write-only arrays, those would be contravariant.
But there is a solution that is more fundamental still: to separate arrays from mutable references entirely, so that arrays are always covariant. Then the type of arrays of integers is a subtype of the type of arrays of numbers, and the true source of the invariance is exposed: it is the reference type constructor (like IoRef t in Haskell) that is invariant.
Microsoft employs the greatest set of programming language researchers ever assembled; they should be able to do better than this.
Scrap your boilerplate with class: extensible generic functions
May 28, 2005 - 01:03 - URL
The paper is http://research.microsoft.com/Users/simonpj/papers/hmap/
large imperative code --> functional
Aug 16, 2005 - 16:42 - Display Lists
Immediate mode OpenGL is gratuitously imperative. Keep in mind that the approach described above isn't functional; it's an imperative program abstracted over Haskell's IO Monad.
A pure functional approach would be to regard graphics rendering as a pure operation Render :: DisplayList -> FrameBuffer, where a DisplayList is a list of elements of an algebraic datatype describing all OpenGL operations, and FrameBuffer is perhaps a 2D array of RGBA pixel values. Rendering, at that level, is purely functional, in that the output of Render should (at least for a sane device) be a referentially transparent function of its parameter.
Distributive laws for the Coinductive Solution of Recursive Equations
Sep 13, 2005 - 04:33 - Difficult to read
It's a shame to see important papers like this written so abstractly as to be unreadable by most language designers and implementors.
For example, one example of distributive laws of monads is the natural transformation from "arrays of dictionaries of t" to "dictionaries of arrays of t", and another is the transformation zip :: [t]*[t] -> [t*t]. The structure here is very interesting and one can see the outline of applications, such as extensions to Haskell's monad comprehension constructs that deal with distributions.
But the author immediately jumps to "Lambda-bialgebras" with maps a:T(X)->X and b:X->F(x) and I'm lost. I'd love to see a concrete example of this because I don't see how to arrive at anything useful from an instance of e.g. a::[t]->t.
The general conclusion of the paper seems to be that certain guarded recursive equations over functions are guaranteed to have a unique solution. This is a topic I have been investigating on my own in implementing an experimental compiler and I wish I could follow the author's reasoning to that point.
Sep 14, 2005 - 02:56 - Aha.
> A List a-algebra over a carrier b is a function List a b -> b
That makes much more sense now. Thanks.
Garbage Collection as a Lazy Algorithm
Sep 19, 2005 - 23:23 - Relationship
In a lazy language, there are 2-3 events in every values lifetime: thunk creation (via evaluation), thunk reduction (via forcing) - which may or may not occur, and thunk abandonment (via garbage collection).
literature on commutative lifted boolean operators
Oct 07, 2005 - 19:02 - Parallel Or
Most of the research on this topic talks about "Parallel Disjunction", which is the or-counterpart of the 'land' operator you describe. I'm not familiar enough to reference specific papers, but a search for "Parallel Disjunction" turns up lots of stuff.
Such parallel operators are definitely the route to "maximum lazyness", and it's easy to envision building higher-level operations from them, especially if nondeterminism is allowed. For example, a parallel "find" operation could find some occurance of a value in a list provided that at least one non-divergent such value is present.
One thing to keep in mind is that the combinatorical performance impact of these parallel operators can be quite dramatic, having to execute an unbounded amount of divergent code while searching for one convergent result. It's quite easy to construct programs this way that are convergent in principle but not on any available hardware.
The Type-System-Feature-Creep Death Spiral
Oct 31, 2005 - 21:01 - partial evaluation
I think Alan has the ultimate answer on the question of typing:
So I think you could put the static typing issue into the larger context of partial evaluation. Would the attempt to produce a sufficiently capable partial evaluator lead to fame and fortune or obscurity and ruin?
Partial evaluation (or abstract interpretation) is indeed the most general intersection between evaluation and typechecking. Building a language around this realization is quite a challenge, though. All of the issues of termination and partiality that are fairly simple in languages like Haskell become compile-time problems and require careful treatment of head normalization and partial approximation, and some degree of implicit or explicit staging.
I do believe this is the next big step in computing. From machine code to Algol/C was the previous big step - the moves from C to C++, Java, C#, and Python are small steps in comparison. But actual attempts at implementing this are few and far between.
Nov 01, 2005 - 17:24 - "Get rid of type declarations"
One other thing I wanted to point out. In the example above:
let point(xval, yval) = tuple (x: xval, y: yval)
A formal way to make this work is to represent types as identity functions. Then, every identity function represents both a type and a function (you can call it and use it to specify types), while non-identity functions have no interpretation as types. And then, the "point" function above happens to just be an identity function. Then you can think of "int" as both the integer type (and write f(x:int)=x+1) and the identity function on integers and call it with int(3).
So the answer to "can you unify types and functions" is affirmative, but the "should" variant of the question remains open.
Keep in mind that the unification above works because there is a unique isomorphism between the types and identity functions in a programming language. This is the essential criteria that justifies unifying two constructs in a programming language.
When one syntactically unifies constructs that are conceptually distinct, the result is less justifyable.
One example is the unification of functions and lists in LISP - which creates some very interesting possibilities for introspection, but it means that functions carry around a lot of observable metadata that breaks foundational properties like parametricity and extensional equality.
Another example is Java and C#'s unification of value types (like int) and object types (like Int). Though C#'s approach is more automatic, both create strange observable properties, such as exposing pointer equality on boxed Int's that differs from the underlying int equality.
In the long-run, such unification of disparate concepts will be recognized as "clever hacks" rather than valid programming language design practices.
Generalized Algebraic Data Types and Object-Oriented Programming
Nov 24, 2005 - 21:47 - It's always great to see rece
It's always great to see recent research advances from Haskell and elsewhere make their way into mainstream languages like this.
C# generic constraints have always seemed somewhat strange in their operation. How does the typechecker work, exactly? When you're typechecking an expression in a context, do you go off and search all of the constraints visible to that context in order to find a match?
In implementing these things elsewhere, it has always seemed more convenient to introduce intersection types and union types and require explicit binding of variables to them, to avoid the go-search-the-environment-for-constraints voodoo. You really want typechecking to be context-free, in the sense that grammars are context-free: a term should have an interpretation independent of the context in which it exists.
Anyone remember Russell (the language, that is)?
Nov 24, 2005 - 22:05 - Thankfully, the full paper is
Thankfully, the full paper is online: http://portal.acm.org/citation.cfm?id=3987. This requires an ACM subscription, which is well worth the cost.
Return of the Global Variables?
Dec 31, 2005 - 00:20 - Same hack, new name
Right, public static variables in a class have all of the same semantics as global variables. It was an interesting tradeoff that the Java designers decided to allow static public variables in classes, but not global variables in packages - because they're essentially the same thing.
It's a good practice to avoid the unnecessary use of global variables and static public class member variables, though they are often hard to avoid. The Unreal codebase has a number of global variables, including a particularly onerous one named UglyHackFlags.
Duck typing vs Structural Subtyping
Dec 29, 2005 - 18:54 - The essence of duck typing
I take duck typing to mean the strategy of applying runtime coercions to make values fit into the types required by operations. For example, some languages let you add the string "1" to the integer 2 to yield the integer 3. In C++, you can add the integer 1 to the floating-point value 2.0 to yield the floating-point value 3.0. Duck typing includes the strategy of loosely typing records/objects so that two such types are deemed equal if they contain the same set of field names and the fields are of coercible types. This is similar to the structural subtyping principle.
There's nothing fundamentally unsound or dangerous about structural subtyping. You can reason about them in the same way as you can reason about tuples: tuples map indices to values of some index-dependent type, while structural records map names to values of some name-dependent type. But of course there's no abstraction there!
To gain compile-time security from type abstraction you definitely want a mechanism to augument structural typing. Nominal typing and partially opaque types are all well-understood solutions to this problem.
Jan 02, 2006 - 00:39 - Extensional Polymorphism
I don't fully understand G'Caml, but I'm not a fan of its basic thesis (that abandoning parametricity brings real benefits). For example, the author's really cool generic printing example can be achieved with union types provided that one can distinguish between obviously disjoint types -- e.g. knowing that integers are not lists, and lists are not floating point values. That certainly doesn't require violating parametricity.
I'm a fan of the view that a nominal type just bundles up a value of some type (such as an integer) with some sort of name (in the spriit of LISP atoms). Haskell's "data" declarations do something like this. As an aside, most of the literature on nominal types confuses names with pointer-like identity and runs into confusion over dependent functions, e.g. should List(Int) differ from List(Int) because they're separate function calls generating different type names? That is silly, of course.
So in my worldview, you only need structural types, names, and a means for bundling up types and values with names, from which you can derive nominal types.
Naked Objects
Jan 31, 2006 - 17:46 - Interesting
The Unreal engine has been using a similar approach since 1995. Programmers define classes using the Java-style UnrealScript language, and use annotations to decide which of the member variables are exposed to designers. The engine then exposes a quite powerful property-editing UI enabling designers to customize the properties of those objects.
Every datatype has a corresponding edit control. Bytes are sliders, booleans are checkboxes, enumerations are drop-down combo boxes, structs create an expandable hierarchy with subcontrols for their components, etc. There are some screenshots showing this here: http://www.unrealtechnology.com/html/technology/ue30.shtml
The resulting user interface is less polished than a typical modelling program (like 3D Studio Max or Maya) where a UI designer lays out every control manually. But the productivity gains from fast turnaround time far outweigh the negatives in this case.
Feb 01, 2006 - 02:59 - Introspection
Languages built on modern type theories guarantee many universal properties which assure that your program does what it says it's going to do. One example is the parametricity theorem (see Wadler's "Theorems for Free"). It guarantees that a program cannot decompose values belonging to a universal type, an important security property enabling you to (for example) call functions in another module with the assurance that they can't do anything malicious with the data.
Introspection, reflection, and even support for unconstrained casting (e.g. from Object to Integer in Java) all violate that property. Therefore, unlimited introspection is not a desirable property in a secure language. (C, C++, Java, C#, and Python aren't secure languages in this regard.)
However, explicit programmer-controlled introspection is a good and reasonable feature. For example, when defining a data type, it would be nice if a language offered a mechanism for specifying that associated metadata should be generated that describes the structure of that data, which may only be directly accessed by the module that contains the definition, but may be passed around to other modules explicitly by the programmer.
More concretely, when I define a type T in the local scope, I should be able to syntactically specify that the compiler should generate a data structure describing T's layout (call it T.metadata, for example), and to make that data public or private. I can then pass that metadata around to dependent-typed functions along with values of that type so that it can be deconstructed -- if I explicitly allow it to be. But, given just a value of some arbitrary type, one should not be able to extract its "type", since that notion violates many universal properties (such as the subtyping property).
Haskell typeclasses enable one to explicitly (not automatically) define metadata associated with a type in this way, and in the Haskell world this is done without violating the language's universal properties. A more automated mechanism ("deriving TypeInfo"?) would make this more user-friendly.
Feb 02, 2006 - 00:07 - The Subtyping Property
raould: By "given just a value of some arbitrary type, one should not be able to extract its 'type', since that notion violates many universal properties", I mean: In the presence of subtyping, a value is expected to be a member of every type that contains it. In a language with subtyping, singleton types, and a top type, what can you say about "the type of 3"? Well, 3 belongs to every type that contains 3.
The least such type is the singleton type containing 3, and the greatest is the top type. These are the only two universal answers we could hope to give, and neither one yields any usable information. This is unlike Java's o.getClass() function on objects, which yields nontrivial information. If we consider Object as a universal type (which Java and C# encourage), then these languages are obviously aparametric. So then they pile on all sorts of extra security features in order to regain just a few of the guarantees that parametricity could have provided.
Daniel: Excellent, thanks for pointing out "deriving Typeable". So Haskell implementations do indeed solve the introspection problem safely. I learn something new about Haskell every day.
So the intuition here is that if I declare a type "deriving Typeable", then I'm explicitly choosing to expose its internal details. But abstract datatypes not declared that way remain abstract such that it's physically impossible for an outside function to break the abstraction barrier.
The Next Mainstream Programming Languages
Feb 03, 2006 - 02:56 - Garbage collection
Garbage collection performance in games:
Though an Unreal Engine 3 game can easily consume 512MB memory (on console) or up to 2GB (on PC, in the editing tools), less than 10% of that memory consists of data that contains pointers. The rest is bulk binary resources - like texture maps, sounds, animations, which don't require scanning for references.
50MB of data is well within the range of a mark-and-sweep garbage collector taking a few milliseconds within a game running at 30 frames per second. This is what we do. Realtime garbage collection is another possibility, but we didn't want to try to implement that within a primarily C++ engine.
Garbage collecting memory:
I do believe this is completely practical as the sole memory management solution, even in a realtime application like a game.
Garbage collecting resources (file handles, etc):
This is a very different question, because resource freeing has observable consequences beyond performance and memory consumption -- unlike garbage collection, which is justified by the realization that if you don't have any pointers to a value in memory, then it can simply dissapear (or not) without observable consequences.
For example, if you have a file handle open for writing, then other applications can't open that file. You want such resource usage to be clear and deterministic, so that files don't just remain open for a random duration depending on the garbage collector's internals.
In general, I would not advocate garbage collection of OS resources or any other thing requiring explicit cleanup. For example, I think that Java/C# finalizers are a misguided idea, because they have observable, nondeterministic consequences. That is exactly the sort of feature a high-level, secure language should avoid. The bizarre finalization state diagrams for those languages should be enough to indicate that something is wrong here!
For resource freeing, constructs which guarantee that every resource is freed look promising (think of wrapping a file handle in an abstraction like a Haskell State monad). Or just plain old handles with explicit closing (and thus the risk of not closing a handle, closing it when it's not open, etc).
FxCop:
These sorts of tools are useful for analyzing code written in existing/legacy languages. Microsoft has some scarily powerful analyzers along these lines that they use internally, as well. But they're all built on the idea that you have code written in a language without strong safety guarantees, and you want to statically analyze them to figure out some of the things that might possibly go wrong at runtime. That is useful, but it's not comparable to the guarantees one would have in a language with strong safety guarantees.
I remember writing game code in assembly language a long time ago. Then macro assemblers came out and made assembly code easier to write correctly. But the eventual solution was C, which led to ~5X higher programming productivity and a comparable reduction in bugs. Ultimately tools like this are just workarounds for legacy problems to which genuine solutions are known ... and on the verge of coming into commercial practice.
Feb 03, 2006 - 20:21 - Why virtual classes?
How is that different from just adding new fields to class Actor? If all classes extending Actor will need those new fields, then it makes sense to add the field to class Actor.
In a one-developer world, you would just add the new functionality to your base class. But in the case of Unreal Engine 3, we maintain and distribute a large software framework used by a hundred or so other developers.
We're continually adding new features to the framework, so if an external developer adds a new member function to our base class's source in their local codebase, then they'll need to merge their changes into all subsequent versions of our framework, and we release frequently - every week or two.
In this case, a virtual class solution that enables a team to extend an entire class hierarchy in a framework, without modifying the framework's source, would be very valuable.
Obviously there are lots of workarounds available (see the slides) - adding extra data components, dynamically casting them to the expected type, etc. But they're quite error-prone.
by the time a game gets to the flat profile, does it usually have sufficient performance?
When all commercially-practical code optimizations have been exausted, further optimizations have to come from the game's content: reducing the number of visible objects, enemies, etc. At that point, it's just a performance vs quality tradeoff, as opposed to performance vs cost.
Arrays are preferable to lists? doesn't that mean imperative counters?
Comprehensions and folds are fine substitutes for imperative iteration.
Type inference doesn't scale
My 1-line example produced an incomprehensible error message in HUGS, but it was easy enough to see what was wrong with that code. Given a multi-hundred like Haskell program with fairly complex typing, it can be combinatorially more difficult to figure out what's wrong, given such an error message. This property doesn't seem at all tenable in a mainstream programming language.
Note that I'm only arguing against type inference: inferring x::Numeric in "f x = x+1". Type propagation, deriving the typing x::Numeric from "x = 7", is not problematic.
Feb 09, 2006 - 03:57 - Tim's idea of adding more
Tim's idea of adding more annotations seems like it would make this problem even worse.
I'm a huge fan of concise syntax.
I'm not such a fan of syntax that's so concise that it hides potential problems. For example, "a[i]" in Java looks like you're just reading an element from an array, but actually translates into something like: "if a is NULL, throw an exception; if i<0 or i>a.length then throw an exception, otherwise read the element from the array".
So the goal of further type annotations would be to introduce more explicitness in type declarations in exchange for making other parts of the program clearer by virtual of eliminating all of the hidden things that can go wrong with them.
Obviously, this isn't what you want for, e.g. a Python-like scripting language, where coders want to write simple code with minimum effort. But for the sort of code we write -- hundreds of thousands of lines of game and engine code -- the explicitness and extra compile-time checking would be a net productivity improvement in the lifecycle of writing, extending, debugging, testing, and shipping complex software.
Gilad Is Right
Feb 18, 2006 - 04:49 - No
In Bracha's work here, and in Microsoft's work on C# 3.0, I sense an undercurrent dragging the language model toward the LISP/Smalltalk "ideal" of metadata-intensive, introspective, dynamic, loosely-typeable programming programming.
This is misguided, for two reasons. First, the demands of secure, robust software imply a need to verify more program at compile-time, not less; this means you need a strong type system or even a non-partial Curry-Howard style proof language as a subset of your language. Second, the increasing importance of robust concurrency implies the need for strong typing to isolate constant storage from mutable storage and effect-free code from effectful code via type annotations.
Of course, it's easy to see how this undercurrent arises. When you release a language, you receive complaints from users about all the things they want to do and can't, and the ultimate way to satisfy all of these requests is to expose all metadata: make it extensible, make objects dynamic, and allow the possibility of completely dynamic typing. The end of the road here looks an awful lot like LISP and SmallTalk.
If you go this route, one day you'll realize you evolved the ultimate hacker language, and it became a godawful mess for writing real programs.
More specifically here, the paper understates the number of important properties that break in an untyped language. If the developer of a popular library adds a new field (function or variable) named "x" to one of his classes, then he breaks all previously-released applications (both source and binary) that have inherit from it and added their own member named "x". Conclusion: modular object-oriented programming really isn't feasible in this model. What else goes wrong? Since (the lack of) typechecking can't change behavior -- the essential point of the paper -- all data-hiding is lost, your objects are open to introspection attacks, and you need to hack on some arbitrary runtime security checking to prevent that.
Feb 18, 2006 - 07:00 - Names and slots
Agreed: if everyone follows the convention of using a fully-qualified name in every field access expression (either explicitly or via importing accessor functions from modules) then conflicts can be avoided. Of course, this would mean that an expression like "p.x" would need to be qualified with something like "p.(Java.lang.point.x)". Or accessed as "x(p)" -- but I suspect a lot of modules are going to be fighting for that accessor named x, so now we get into decorating accessor names to make them sufficiently unique, so we would want something like PointX(p).
Or we need to declare "Point p" with an explicit type. The conclusion is: modular object oriented programming without explicit types requires either fully qualified names everywhere or unambiguous accessor functions explicitly imported from modules (leading to large decorated names). This is workable, but quite a mess.
Virtual Machines, Language Runtimes, and the Future of Objective C
Mar 13, 2006 - 00:23 - A real problem?
This isn't a real problem. Both Java and .NET are available on Mac. And if they weren't, it wouldn't really matter because major multiplatform applications simply aren't written in Java or .NET (by applications, I mean applications - not servelets, scripts, in-house database front-ends, etc which are plentiful on Java and C#).
Why aren't major applications written in Java or .NET? Because Java, the Sun Vendor Agenda Language, is seen as the bastard stepchild language on Windows. And .NET, the Microsoft Vendor Agenda Language, is avoided by people writing multiplatform applications. So, ultimately, everyone avoids using Vendor Agenda Languages for major applications.
So it really doesn't matter.
Also bear in mind that the implicit memory management of Java and C# provide a productivity gain for developers, of something like 20-30%. This isn't anything fundamental, and if porting a .NET language to Linux or Mac turns out to be a major effort, then that 20-30% can easily be lost.
Rich resource site for the programming language "K"
Mar 13, 2006 - 00:28 - Yikes
Yikes, ASCII art languages. From the site:
,/|\'(&x)_ y
x[ ,/&\'(&x)_ y
sa:{x[ il:{(#x)#+\0,x}
{,/x+' d:{[n;f]:[n;_f[n-1;f]';f]}
(x,"*")[(#x)&
The notation here does not seem particularly helpful to understanding the intention behind the code.
Mar 14, 2006 - 19:33 - There are two things here
There are two things here which impede my understanding of "cross:(,/,/:\:)/(),/:".
The most obvious is that I don't have an intuition of how this is parsed, e.g. where the symbol boundaries lie and how they are joined together into programming constructs. From reading and mathematics, we all have an elaborate intuition of how to parse things like "myfunc(x,a)=x*2+a". That intuition doesn't help us with complex new symbols and precedence rules are introduced.
The other impediment also exists in reading Haskell code written in the "point-free" style, where functions are defined as compositions and higher-order functions of other functions such as "f = map (sin . cos)". Here, I can easily parse this definition, but it's harder to see the meaning given just prerequisite knowledge of math and other programming languages. To me, this definition is much clearer when function arguments and the underlying computation are written explicitly, such as "f(xs)=[sin(cos(x)) | x in xs]".
Folding neither Left nor Right (or how to avoid overspecifying the solution to a problem)
Apr 30, 2006 - 03:14 - Neutral folds
A compiler for a Turing-complete language can't ascertain whether an arbitrary function of two variables is commutative. So, could we hope to come up with an incomplete but conservative solution for determining commutativity that succeeds in simple cases? Unfortunately, even that seems intractably difficult to implement, and even then, only likely to work in the most simplistic of cases.
A note on distributed computing
May 04, 2006 - 18:10 - Is this a political dissertation?
The authors paper over the usability benefits of a single object representation supporting both local and distributed computation. Such a representation is required because software developers require a uniform, simple programming model.
(Biased writing style taken from the paper itself.)
The Weird World of Bi-Directional Programming
May 31, 2006 - 23:17 - Excellent presentation
Agreed, it's such an excellent presentation that it has me interested in a subject from which I would have otherwise run away screaming.
There are a great many cases where it is valuable to be able to statically express and verify algorithms that are bijective or fit elsewhere in Pierce's "Lens Bestiary".
Variance and Generalized Constraints for C# Generics
Jun 19, 2006 - 22:13 - Tension
Excellent work from the authors, as always. There is much tension in exposing variance specifiers in a language that assumes mutability everywhere and has no means of specifying constant references and constant operations. The use of "where" constraints further implies complex nonlocal reasoning both in typechecking and on the human side when we read the code and try to understand why it works. These extensions stretch the C language family to an impressive new local optima, but at some point the world is going to have to move away from this imperative-everywhere, broken-const model of programming.
Petition for adding garbage collection to C++.
Jun 27, 2006 - 00:35 - Unreal Engine 3 garbage collection
Since several folks asked:
In Unreal Engine 3, we implement garbage collection for all "heavyweight" objects, those objects for which we maintain complete metadata, support orthogonal persistence, etc which are similar in functionality to instances of Java/C# "object" in their capabilities. However, we also have "lightweight" allocations which we manage directly without garbage collection.
Typically during gameplay, there are 40,000 heavyweight objects around, with extremely complex chains of references (including cyclic references) and ownership relationships, thus making garbage collection a huge win in productivity. There are then hundreds of thousands of lightweight allocations whose ownership relationships are extremely simple and thus easily manageable explicitly.
Overall I wouldn't endorse a proposal to add garbage-collection features to C++, since the language is sufficiently low-level that most applications will want to handle memory management quite uniquely. Of course, I do see GC as perfectly appropriate for higher-level languages like Java and C#, which don't expose pointers-as-arrays, unchecked memory access, deal with low-level OS data structures, etc.
It seems like 90% of the effort of implementing GC on top of C++ is in collecting proper metadata for everything. Thus it would be useful for future C++ standards to address the problem of reflection very thoroughly, while leaving actual GC to the application.
Jun 27, 2006 - 19:27 - Tim Sweeney
The best thing that can be said for garbage collection is that it's the ideal tool for 110-IQ programmers and half-baked designs.
You're essentially claiming that programmers with high IQs prefer to solve problems in difficult, unproductive ways.
In large software projects with complex data relationships and multiple programmers, garbage collection yields very significant productivity gains compared to manual memory management. A whole class of bugs (dangling pointers, memory leaks) and impediments to understanding (exactly who should free which object when) are eliminated.
A practical, productivity-focused programmer -- of whatever IQ -- will welcome any areas where basic language features can improve productivity and software reliability. And he will outperform higher-IQ programmers who stick with lower-productivity solutions.
Securing the .NET Programming Model
Jun 27, 2006 - 00:27 - Full abstraction in multi-language runtimes
Full abstraction would be a difficult feat to achieve in a multi-language runtime. For example, languages in which the parametricity theorem holds require an additional set of abstraction properties. What about a language with abstraction properties based on dependent types? Proof carrying code? Many features unique to certain languages would require additional CLR-level support to maintain full abstraction, so much that this doesn't seem a reasonable property to expect in a multi-language runtime.
Of course, since C# is the cannonical language of CLR, it makes sense to tighten the system to the point where full abstraction does hold for C#, without expecting this of other languages implemented on top of CLR.
For example, if you were going to build a serious proof-carrying code framework to implement further degrees of security, you would expect the proofs to be checked and the abstraction properties to hold at runtime in some higher-level translator from your language to CLR, rather than expecting them to actually hold in CLR.
Thus CLR may eventually prove "too tight" for high-performance languages that employ proofs and more advanced type systems, since the source language can often eliminate array out-of-bounds checks, null pointer checks, and implicitly thread-parallelize code, while the CLR itself cannot and thus incurs runtime penalties. But this isn't so much a complaint about CLR (which seems to be exactly what it should be), but a statement that it's not the last language runtime the world will ever need.
Jun 27, 2006 - 17:08 - Holes
Would that not mean that components written in other languages could provide "holes", allowing attacks against C# components? (Assuming that the other languages do not provide stronger guarantees than C#.)
I think that could be avoided completely with small refinements to the CLR. For example, if the CLR supported a genuine bool type and enforced that it only contained true and false (and not other byte values like 67), then other languages could interoperate with C# without breaking the abstraction property in C#.
This is news to me! Wasn't .NET supposed to be the panacea for all our untrusted vs trusted code headaches? Did this dream die?
The behavior that Kennedy et all have pointed out are not security holes in the traditional sense, e.g. of what Internet Explorer is littered with. These findings offer no way to overwrite random memory, corrupt the stack, hijack a computer, etc.
Rather, they just show that certain ideal properties of the C# language don't hold in the CLR, and suggest ways to improve this. For example, there exist bool values v where v!=true && v!=false.
I only wish the rest of the industry (and of Microsoft!) were as open and respectable about identifying limitations and potential security issues and working publically to study and address them.
Jun 30, 2006 - 18:00 - Unexpected behavior
The point is that a malicious agent can cause unexpected behaviour in security critical code. He has a foot in the door. Whether or not the attacker can escalate this into a full-scale compromise is a matter of luck and the attacker's skill.
This opens a wonderful can of worms!
Given bool b, should C#'s type system assure that (b!=true && b!=false) is never true? Should it assure that given int a,b with a>0 and b>0, that a+b>0? That given "object a=3,b=3" that a==b? None of those properties hold in C#, due to various historical accidents that most programmers educated in this decade don't intuitively understand.
Jul 06, 2006 - 17:22 - You're right
You're right. (b != true && b != false) is an example of weird, semantically incorrect behavior while given int a>0,b>0, having a+b less than zero is weird, semantically correct behavior.
Socially Responsive, Environmentally Friendly Logic
Jul 11, 2006 - 23:27 - Weird
I so don't see what this has to do with anything.
Cost of provably-correct code
Jul 25, 2006 - 17:15 - Big Theorems
With correctness proofs, you often run into the problem that the theorem you aspire to prove is approximately as lengthy and complicated as a well-written but unproven implementation of the program itself.
For example, imagine the sort of theorem that a word processor would need to obey -- there would be sub-theorems concerning how input, display, printing, formatting, etc need to be handled.
If your theorem is as complicated as a simple but unproven implementation of a program, then the former is no better than the later, because you can just as easily have bugs in your theorem (e.g. causing the wrong things to be proven) as bugs in your program (causing unexpected runtime behavior).
Generics as a Library
Jul 28, 2006 - 00:43 - Question
Can someone explain how this approach coexists with the parametricity theorem?
For example, if we can handle lists of characters in an ad-hoc manner, but lists of all other data types generically, then can't we use that to write a non-constant, univerally quantified function of IsChar :: forall t. t->Bool enabling us to distinguish between characters and values of any other type? (Which is impossible in a language in which the parametricity theorem holds.)
Is the trick that only certain types have a representation typeclass Rep t enabling us to discriminate between values of those types, but not enabling us to discriminate between values of an arbitrary universally quantified type? That seems workable, because we can just look at those representation types when choosing to implement special-cases. This implies the existance of some sort of metasyntactic (e.g. aparametric) stage in compilation where these representation types are generated on behalf of the user for all or certain user-defined types, lest the user have to write boilerplate for each new user-defined datatype.
Jul 29, 2006 - 01:59 - Expression Problem
Mainstream programmers tend to primarily write very domain-specific code rather than the sort of high-level frameworks and libraries where the expression problem mainly shows up. Improved generic support will result in a more expressive language that better empowers framework and library writers (thus leading to better frameworks and libraries). But I don't suppose the typical everyman programmer will make a lot of use of it directly.
Aug 01, 2006 - 18:58 - Parametricity in Haskell
So the interesting question, related to yours, is: does the parametricity theorem yield in Haskell?
My understanding is that parametricity holds in all referentially transparent Haskell code that does not use explicit forcing via "seq". Code using "seq" is "parametric mod _|_". Haskell does support code that isn't referentially transparent via unsafePerformIO and I'm not sure what implications this has on parametricity.
Overloading : Why do some languages leave it out?
Aug 03, 2006 - 18:24 - Overloading is hard
Overloading is frequently neglected because it is difficult. The more advanced your type system, the more difficult overloading can be. Key problems:
In type systems supporting coercions and conversions between disjoint types (e.g. C's automatic casting from char to int), ambiguities arise regarding which combination of coercions and overloads should be effective for a given function invocation. C++ deals with this by specifying a set of priority rules to determine the "best" overload.
In type systems supporting universal quantification over types (implicit or explicit), overloading isn't an entirely local feature. For example, if we defined (Haskell syntax) overloads f::int->int and f::string->string, we could then define g x = f. To write a type signature for that, you need additional type system features (intersection types are one solution). You likely also need the ability to represent polymorphic values at runtime (e.g. values of the union of the types int and string) and cast them, because at a given declaration site, you don't know which particular types are effective.
In general, the type of the overloaded function f could be expressed (int->int)*(string->string) where * is the type intersection operator. Or you could write it "exists(t=int|string) t->t" where we existentially quantify over the two choices for t, int and string. There are many other workable solutions.
In type systems with the parametricity property, overloading is required to be implemented "honestly". In general, you can only support overloading over types whose values are unambiguously distinguishable. For example, if function spaces are contravariant, then you can't overload f::(int->int)->bool and f::(string->string)->bool, because you can't distinguish int->int and string->string -- they both include larger functions such as forall t. t->t.
To me, the key to sane overloading in a language is the ability for your compiler to implement it as syntactic sugar, and translate it into simpler primitives in a core language. In other words, given two definitions f::t->u=.. and f::v->w=.., the compiler should be able to output an equivalant single function f=.. which is semantically equivalant to the overloaded function -- it must accept the same parameters either functions accepts, reject any functions both reject, produce the proper result with no unexpected side effects, and give a suitable error if the overload is ambiguous.
That this isn't possible in most languages is an indication of their failures to implement a "complete" type system.
What does "complete" mean? For a simple type system with invariant functions, you need union types and casting to implement overloading. This is the easy case.
The translation is more tricky with contravariant functions, because some overloads become ambiguous. Universal quantification complicates it further -- now you need the ability to combine multiple universal quantifications into one, over a function in which only some of the quantifiers can be implicitly instantiated. Existential types add another layer of complexity, as do dependent types, etc.
But I think it is vital for a programming language to choose a feature set that is complete with respect to overloading, lest its expressiveness be crippled in surprising ways. For every advanced type system feature you expose, you need to say exactly how it works (or doesn't) in the presence of overloading.
Constant Values as Types
Sep 03, 2006 - 03:55 - Singleton types
Singleton types are the simplest answer, so you could have the typing 42:singleton(42).
You might also want to have an uninhabited type. Call it "nothing".
You might also want to have the type of all singleton subtypes of integer. If you have the later type, you can call it option(t) or ?t, and then you have the typings singleton{32}:?int, and nothing:?int. If you support types as first-class values in a programming language, then option types, singleton types, and the uninhabited type have a serendipitous relationship, as you can use them as types, as values, and in combination for a proofs-as-programs style.
This is the approach I'm taking in my programming language R&D work.
Sep 03, 2006 - 23:04 - Right
Such a type system is generally undecidable, but it can typecheck all programs that typecheck in simpler decidable type systems, plus lots of additional programs written using more powerful and expressive constructs (but not all of them!)
I'm certainly an advocate of gaining more expressive type systems at the expense of decidability.
After all, what you really care about is the following: If your compiler accepts a program successfully, then the program is type-safe. If it's not type-safe, then you might get an error message, or a stack overflow, or a core dump, or whatever -- but in no case will you see the compiler accept a program that's not type-safe.
Small Value Set Types
Sep 03, 2006 - 23:06 - Finite set types
See http://lambda-the-ultimate.org/classic/message6641.html for a link to a language with a type system that tackles some of these concepts very generally.
expressivity of "idiomatic C++"
Sep 08, 2006 - 17:16 - Relying on libraries
This is common language bigotry, and the authors cheat on their C++ "solution". They're relying on a heavyweight library class to do the difficult work, and thus hiding both the complexity and performance implications of their solution.
If you're going to compare language expressiveness and programming paradigms using examples like the Hamming problem, YOU HAVE TO ACTUALLY SOLVE THE PROBLEM, and not call some big library function to solve it for you. Otherwise you're just comparing libraries, and degenerately, an arbitrarily crappy language with a library that includes GiveMeHammingNumbers() wins.
Why Johnny can't code
Sep 18, 2006 - 22:58 - Early programming languages
Early programming languages like BASIC were line-oriented because the user's interface to them was a teletype. Back then, there were also line-oriented text editors, where you'd type a bunch of lines, and then edit by saying "on line 3, replace 'teh' by 'the'".
Soon after actual video display terminals arose, text editors became screen-oriented, and programming languages became file-oriented. Seems like progress.
Nowadays you'd just learn Visual Basic, Java, JavaScript, or Python rather than line-oriented BASIC. If you really want to go retro, download the old BASICA interpretter, or an Apple II emulator.
Is "post OO" just over?
Sep 24, 2006 - 20:46 - AOP is only successful in
AOP is only successful in that many researchers have received funding to investigate it. In the real world, AOP is entirely unsuccessful: No mainstream programming language has integrated major AOP features. And this is the correct state of affairs because, as the author more eloquently states, the techniques that are unique to AOP are a collection of dangerous hacks.
Fortunately, many of the original goals of AOP (implementing features that cross-cut data types and programs) are more soundly satisfied by the latest work on generic programming, Haskell typeclasses, etc.
Relationship between access modifiers and type
Oct 03, 2006 - 20:45 - What are access modifiers?
This got me thinking about what exactly the relationship between access modifiers and a class's type is.
For first-class reasoning about this, you need a fairly powerful type theory. The cleanest way of reasoning about this is to say that a class is a record whose set of fields is existentially quantified with a known lower-bound. Or, for more advanced access, several staged lower bounds, one for each sort of context in which the class can be seen (e.g. within the class scope, in the same package as the class, in an external package).
In other words, represent class {public: int x,y; private: int z;} as something like: exists(s:set>=set{'x','y'} = set{'x','y','z'}) lambda(field:s) case(field) of {'x'->int,'y'->int,'z'->int}. Here, "outside of the definition" you only know the set of fields includes 'x' and 'y' and since you don't know whether or not 'z' belongs you cannot access it; while "inside the definition" you know the exact set of fields is x, y, and z.
With more advanced staging constructs, code in different contexts could see different lower bounds, enabling Java "protected" access within a package, etc.
The best way to reason about features like this is to translate them into a core language with a powerful type theory, rather than building up ad-hoc theories of types-with-access-protection, etc as is typically done when reasoning about object oriented languages.
With this approach, questions like "How do I abstract over access modifiers?" and "Is this thing a subtype of that thing with the same fields but different access modifiers?" are easy to answer by referring back to the core language translation, in which everything is a term which may be analyzed and abstracted over.
A Very Modal Model of a Modern, Major, General Type System
Oct 10, 2006 - 17:31 - Quantifiers
This is the first paper I've seen that explicitly defines universal quantifiers and existential quantifiers as dependent type intersections and unions indexed over the quantifier's domain type. (Is this fact obvious to type theorists? It took me quite a lot of stumbling to recognize the connection myself.) There are a lot of other interesting things to be found in this paper. It's been clear to me for a while that implementing proof-carrying code on top of mainstream (powerful and complex) programming languages will require something quite a lot like the system described here.
Quotation and evaluation -- or, how pure is the pure lambda calculus?
Dec 27, 2006 - 21:30 - Lambda terms as graphs
The graph representation of lambda terms dates back to Bourbaki. For a modern overview, see Ariola's "Lambda calculus plus letrec": http://www.cs.uoregon.edu/~ariola/cycles.html.
AgentSheets: End-User Programing (and Secret Lisp Success Story!)
Dec 31, 2006 - 00:27 - Anti-Objects
Solving a problem by defining intermediate data structures that lack a direct physical mapping to your problem space is an utterly common technique in all areas of computing. Presenting it as some sort of invention is silly.
Patrick Logan on Software Transaction Memory
Feb 09, 2007 - 01:48 - STM nervousness
STM makes me nervous too, but raw shared-state concurrency is terrifying, so that's an improvement! Here are some thoughts:
"large scale stateful software isn't going away" and "imperative programming is the wrong default" are contradictory
My current analysis of the sort of code that's required in a game is that purely functional programming (plus Haskell-style ST non-imperative local state) is sufficiently expressive for around 70% of the code we write, which accounts for 95% of our performance profile. Thus I see functional programming as the right default.
The other 30% of our code (which accounts for 5% of performance) is necessarily so stateful that purely functional programming and message-passing concurrency are implausible. Thus I see imperative programming, in some form or another, as remaining an essential tool for some systems present in modern software.
Performance is crucial to him, yet he's willing to live with a 4X slowdown for STM
Without STM, the only tractable way to manage this code is to single-thread it. I'm not nearly smart enough to write race-free, deadlock-free code that scales to 10,000 freely-interacting objects from 1000 C++ classes maintained by tens of programmers in different locations.
With STM, I can continue writing software at full productivity, and it makes 5% of my performance profile become 4X slower. I break even at 4 threads, and come out ahead after that. So, eventually, STM wins over single-threading.
How do the message-passing guys implement robust interactions between independent objects with mutable state, like the classic "bank transfer" example that justifies transactions in databases? You end up writing an endless set of ad hoc transaction-like message exchanges for each interaction that may occur in the system. Thus my argument that STM is the only productivity-preserving concurrency solution for the kind of problems we encounter in complex object-oriented systems that truly necessitate state, such as games.
Feb 09, 2007 - 18:44 - Can you help me understand
Can you help me understand why message passing concurrency is implausible for your domain? Isn't the Opengl API itself a wrapper over a message passing architecture?
Consider a game like Grand Theft Auto, where 10,000 or more objects move around and interact independently. Here, the objects are things like people, cars, weapons, props, etc. Each object has attributes that change over time, such as position, damage, relationships with other objects (who's carrying what), etc. At any point, any set of objects can potentially interact with each other in a stateful way, for example I can get in a car, drive it around, and run into a mailbox, damaging it.
Many of these interactions require atomic updates of groups of objects. For example, to fire a weapon, I need to first determine whether I'm carrying the weapon, whether the weapon has any ammunition, and then I need to create a bullet object. If a weapon has one bullet and two people tried to fire it simultaneously, non-atomic updates might lead both players to conclude that they can fire the weapon, so two bullets are created, and the gun is left with -1 bullets, an inconsistent state.
Basically, all of the atomicity arguments that have been applied to databases (e.g. the bank-transfer example) directly map onto the game example.
Therefore, you need some way to guarantee atomicity. Candidates:
- We do this today (on 3-core CPUs) by simply single-threading this kind of code.
- You could implement this using message passing, but you'd be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program.
- If our game were sufficiently simple, we could multi-thread it by carefully locking and synchronizing the objects at the appropriate point, but as software complexity scales up, the analysis of whether the program might eventually deadlock is intractable.
So, in this case, transactional memory is the most natural productive solution to this problem. Keep in mind, we live in a comparatively simple world, since we always targeting a single multi-core CPU with shared coherent memory, and aren't concerned with databases, distributed computing, or fault-tolerance.
Feb 09, 2007 - 20:21 - Bloat
The problem with implementing ad-hoc transactions using message-passing is that it bloats the code tremendously. In the single-threaded world or the STM world, you say "if(Ammo>0) {Ammo--; FireBullet();}". That would bloat up into tens of lines of asynchronous code to ask the recipient if he has ammunition, to reserve that ammunition, to finalize the interaction by effecting the ammo reduction, handle failure asynchronously, etc. Such code needs far more thought and testing than the STM version.
A solution that imposes an order-of-magnitude code bloat, productivity decrease, etc, is unattractive.
Behaviour Diffing
Feb 27, 2007 - 18:45 - Behavior diffing
For simple programs that are recursive in nature (e.g. compilers), I wrote a hierarchical logging library to generate a log of program activity as an text file with varying levels of indentation, and use a standard diff tool to compare them.
This is an area where development tools could provide a lot of value. Determining when the execution two versions of a program diverge is an extremely labor-intensive task to carry out manually for complex programs.
Syntax Solicited for Imperative-flavored Concurrent Language with Keywords
Mar 16, 2007 - 02:18 - Imperatrix Ornothologica
http://www.toontalk.com/english/computer.htm
Wow. ToonTalk appears to be a serious product to teach concurrent constraint programming, to children, by means of the following metaphors:
- computation :: city
- actor or process or object :: house
- methods :: robots
- method preconditions :: contents of thought bubble
- method actions :: actions taught to robot
- tuples or messages or vectors :: boxes
- actor spawning :: loaded trucks
- actor termination :: bombs
- channel transmit capabilities :: birds
- channel receive capabilities :: nests
Another way to understand the computational model of ToonTalk is to consider what you can train robots to do:
- send a message by giving a box or pad to a bird
- spawn a new agent by dropping a box and a team of robots into a truck
- perform simple primitive operations (e.g. +, -, *) by dropping a pad on a pad
- terminate an agent by setting off a bomb
Again, wow. That's one hell of a metaphor there!
Twitter and Rails brouhaha
Apr 16, 2007 - 20:59 - Translation
MVF = Model-View-Controller (oops)
ST = SmallTalk
The expression problem, Scandinavian style
May 07, 2007 - 20:34 - Excellent
This is a refreshingly excellent paper! The solution is very intuitive, the exposition is very clear, up there with SPJ's work. I wish papers more were written like this: Here's the problem, here's an example of code that solves the problem using intuitive language constructs, and finally some discussion on theory. Too often, it's: Here's the problem, here are fifty pages of derivations, and finally a vague outline of how this might be applied in the real world.
The key detail here is that to compose a data structure extension with an operation extension, you need to create a third extension that defines the new operation for the new datatype. This way, the compiler can always verify that your program is sound is free of any data structures lacking operations, without any of that multimethod voodoo of determining which overload is the "best" match according to some obscure rules.
"Practical" advantages of lazy evaluation
May 30, 2007 - 15:25 - Advantages
The benefit of lazy evaluation is that it allows the language runtime to discard subexpressions that don't determine the end result of your program. Often, when you have lots of temporary computations and conditionals, some work can be discarded this way. As a result, lazy evaluation can sometimes reduce the combinatorial time complexity of an algorithm.
The drawback to lazy evaluation is that it forces the runtime to suspend the evaluation of subexpressions until it is certain that they determine the end result of your program, in order to guarantee that an unevaluated subexpression that diverges doesn't cause the program itself to diverge. This must be done by creating thunks containing the suspended state of a subexpression. In a garbage-collected language, this can sometimes increase the space complexity of an algorithm.
Lazy evaluation also increases expressiveness, as it enables you to access components of data structures out-of-order as you're initializing them, as long as you don't create any circular dependencies.
There are intermediate schemes such as lenient evaluation that bring some of the benefits of lazy evaluation without some of the drawbacks.
Partial evaluation applied to high speed lighting preview
Jun 05, 2007 - 13:13 - Partial Evaluation in rendering
Many of the old software renderers in the 1990's patched binary constants into existing x86 code at offsets generated by an assembler. You could call that partial evaluation!
Merging Functions, Modules, Classes, the whole nine yards...
Jun 05, 2007 - 13:18 - Merging stuff
You can do that, but you have a problem here. Now, what does it mean for two functions to be equivalent? Ordinarily, you choose either extensional equality (functions are equivalent iff for all inputs they produce the same result) or the weaker intensional equality. But allowing "extension" like this exposes all the local variables of a function as part of that function's identity.
It's probably better to factor that sort of code into a class (which you can inherit from and extend by overriding existing fields and adding new ones), and a function that constructs an element of that class and then returns the member coinciding to the "return" value. This scheme doesn't require stretching the notions of extension and equivalence.
Simply Easy! (An Implementation of a Dependently Typed Lambda Calculus)
Jul 09, 2007 - 19:18 - Dependent types
The authors are certainly correct that dependent types provide a cleaner solution than the hodge-podge of features found in more widely-used languages.
I'm afraid they are also correct in the suggestion at the end of the paper that regaining the expressiveness needed for large-scale programming may require the introduction of general recursion, and the possibility that the typechecker will loop in some circumstances.
Ultimately, we need to embrace partiality in the general case of typechecking. Large-scale programming will never be done in a total language, because the burden of proving your program total imposes a development cost multiplier of several orders of magnitude. And, when you mix types and values to gain expressiveness, any partiality in the value language necessary percolates up to the type system.
So, we need to grow comfortable with partial type systems. I feel the lack of significant progress in type theory in the past decade is largely a result of refusing to embrace general type recursion, which causes researchers to focus in on microscopic details of proving totality for slight improvements on existing type systems, rather than exploring the wider possibilities.
Jul 09, 2007 - 21:17 - Can you imagine your typical
Can you imagine your typical programming team mucking about with a type-checker debugger, trying to figure out why their app won't compile?
Hey, C++ templates are a Turing complete compile-time construct, hence they can loop. Industry-wide, maybe 0.01% of C++ debugging time is spent determining why a template expression loops. Much as we all hate C++, it's the world's leading mainstream programming language today, and the partial typechecker is absolutely not a barrier to adoption.
Jul 09, 2007 - 23:38 - General recursion
For example, consider a dependent-typed program whose compilation constructs a parser from a parser definition in a string (say, in BNF format). Then, when you run that program, it parses another file using that parser. Here, we're moving something that is traditionally done at runtime to compile-time, which is sensible here since constructing a parser is a purely functional operation.
Well, depending on what the string contains, compiling this could loop. So, to write the dependent-typed parser generator using wellfounded constructs, you can't simply write a function from strings to parsers, you need to construct a provably wellfounded parser by hand using constructs like primitive recursion.
More importantly, it's far easier for real-world programmers to write any sort of code using general recursion than to restructure everything using e.g. primitive recursion or this paper's FoldNat/NatElim constructs -- perhaps an order-of-magnitude easier. Such productivity multipliers are a big deal for large-scale programming.
In those cases, programmers would much prefer to write code easily using general recursion and to debug the very rare cases where compilation doesn't terminate, than to spend a much greater effort writing these algorithms using unfamiliar induction constructs.
Jul 12, 2007 - 19:27 - Type erasure requires totality
You've made a very important point here: type erasure requires totality. Any dependent-typed expression that may diverge must be fully evaluted at runtime, otherwise the type system is unsound.
If the language is ambivalent about partiality, this means dependent types carry a significant runtime cost.
The answer is that dependent-typed, performance-critical languages need to have some sort of effects-typing system so that the runtime can erase types whose computation is conservatively known to be total. In the general case, it won't be able to ascertain that, but in the typical cases of simple dependent types (e.g. the Haskell-like subset), it's pretty straightforward.
Jul 13, 2007 - 19:27 - Proof-Carrying Code
For security, it just means that non-total type expression can't be erased, but must be evaluated at runtime to preserve the semantics. It's a performance hit, probably not a huge one.
Comprehensions with "Order by" and "Group by"
Jul 27, 2007 - 21:01 - Is there a better way?
To me, the major benefit of comprehensions is that they enable us to express frequent programming patterns by referring to simple things with variables, rather than combining higher-order functions.
Philip and Simon's new extensions further enable us to express frequent programming patterns pointfully. Still, this scheme feels a bit arbitrary. I wonder if there is a nicer generalization, perhaps less directly mapped to SQL.
Sjoerd gets to the core of the issue, that records make for cleaner comprehensions than pattern-matched algebraic data types. With idiomatic Haskell ADTs, you frequently must invent redundent local names for tupled elements in an ADT.
Universal Type System
Nov 07, 2007 - 19:38 - Universal Type System
The set of typing rules in a type system is equivalent to the set of axioms in a logic. Therefore, there is a direct translation of Godel's incompleteness theorem, saying: Any sufficiently powerful type system is either unsound (all types are equivalent), or incomplete (there exist programs whose correctness can't be proven or disproven within the type system).
So, we can't hope to construct one sound type system that contains all other type systems.
However, we can certainly construct a type system that can represent a great number of existing popular type systems. Much of the practical work in type theory consists of showing that one type system contains another in this way, and this can be directly applied in representing types from different programming languages within a single type system.
So, one can make great progress in unifying type systems, though it's not possible to "finish" the task.
Fortunately, incompleteness generally doesn't manifest itself when typechecking programs a person would ordinary write. Incompleteness arises from tricky arguments involving infinities and non-inductive recursion, which require the introduction of new axioms to prove or refute.
The logic analogy shows that this shouldn't be surprising: 2500 years passed from the time philisophers began stating recognizable mathematical axioms and proving theorems, to the time mathematicians realized that all such axiom systems were incomplete.
Nov 08, 2007 - 19:56 - Incompleteness vs The Halting Problem
That's a great question. Does anybody know the answer?
Godel showed that given any axiom system as powerful as ZF, there exist theorems which can neither be proven nor refuted.
Turning showed that given any computer as powerful as a Turing Machine, there exist programs whose termination can neither be proven nor refuted.
I think the answer is: There exist type systems as powerful as ZF which are not Turing-complete, so Godel's theorem gives us insight into systems where Turing's theorem is inapplicable. We can't construct a model of a Turing machine in ZF, because the axiom of comprehension is insufficiently powerful to represent general recursion. But on a Turing machine, we can write a ZF axiom prover that terminates for any axiom that is provable or refutable -- but it might not halt.
Nov 10, 2007 - 03:45 - Turning complete type systems
Note that Turing-complete type systems (while good and useful) don't guarantee the programmer the ability to extend the language's type system naturally with new features or constructs.
While they enable a programmer to write a typechecker for a more powerful type system, and run it at compile-time in the source language, you would need the source compiler to expose some meta-representation of the program at compile-time, with some sort of compositional semantics, in order to actually extend the base type system usefully. This can be done, but isn't automatic. For example, in C++, templates make the type system Turning-complete, but the language lacks the ability to, e.g., define templates taking template abstractions as parameters, so exposing certain abstractions requires the user to manually create metadata containers to describe types, rather than just using them directly.
And, even if the compiler did expose a meta-representation of the language to its Turing-complete typechecker, naive introspection of that representation would break various theoretical properties of the unextended language -- such as subsumption and parametricity.
This isn't an argument against Turing complete type systems, rather just a claim that they aren't the total solution for type system extension.
Derivation trees for lambda-calculus
Nov 27, 2007 - 20:35 - Abstract syntax tree
That's just the abstract syntax tree resulting from parsing the lambda expression. It's just another way of writing the same thing, and doesn't convey any new information.
Wikipedia has a page on inferring types from simple lambda terms: http://en.wikipedia.org/wiki/Type_inference
foundations for J, APL etc
Nov 29, 2007 - 23:32 - APL and vector-like data types
So, for most scalar operations in APL, the compiler lets you to apply the operation to a vector data type, with the appropriate lifting. One Haskell analogy is having the programmer writing "f xs" and the compiler interpreting it as "map f xs" when xs is a list.
I spent some time investigating APL-like lifting of functions to vectors, arrays, etc, in a more general programming language setting.
In general, that doesn't work, because the resulting system is ambiguous. For example, say that you define a+b on integers to indicate addition, and s+t on arrays to indicate concatenation. How should the compiler interpret a+b on arrays of integers? It could be either an element-wise vector addition, or array concatenation.
Now you're probably thinking the compiler could define some precedence rules to disambiguate such an expression. That would be unpredictable -- a feature that works in some cases confusingly wouldn't work in others. Depending on context, maybe the compiler would interpret a-b as elementwise subtraction, but a+b as concatenation. Worse, given strong implicit typing features (Java generics, Haskell typeclasses), the general case can't be disambiguated for lack of concrete context.
So this aspect of APL is a clever trick for code reduction, but it's unsound and inappropriate in a general programming language. There, we need to clearly specify how we want to lift a function (using map, fold, etc, or explicit recursion and case decomposition) when applying it to a collection data type.
Note that languages like Haskell could make this process simpler than they do. For example, when you define a datatype like "Tree t = Leaf t | Node (Tree t) (Tree t)" which is structured as a covariant functor, the compiler could automatically generate the map function for that type, as it's uniquely determined by the type's structure. When you define a recursive datatype as a fixed point of a functor, the compiler could automatically generate the fold function.
Parametric datatype-genericity
Dec 04, 2007 - 19:50 - Genericity
Translating this stuff into category theory isn't essential. It's sufficient to enumerate the coherence that laws folds are guaranteed to obey in a particular type system. The later approach is preferable, as abstract category theory presentations of these topics tend to obscure the results from programmers who could otherwise put them into practice.
To me, the paper's key point is that the Generic Haskell approach is ad hoc (resulting in fewer universal properties), whereas the Algebra of Programming approach is more lawful.
I realized a while ago that "zip" ought to be called "transpose", as the underlying operation is to translate a value of F(G(t)) to G(F(t)) using the underlying join. For example, you can naturally zip an array of dictionaries into a dictionary of arrays, where the resulting dictionary contains, for every index in the original dictionaries, the ordered concatenation of all values under that index from all of the sources.
Everyone has a fairly good understanding of functors and monads nowadays, but I feel like we're just scratching the surface of the generalizations that are possible regarding transformations between them.
A Growable Language Manifesto
Dec 07, 2007 - 21:07 - Nice Manifesto
But be careful with expectations of extensible types features, metareflection, and dynamic typing. They conspire globally against counter type soundness and universal properties that are desirable for program verification and concurrency.
Taken to their logical conclusion, you get things like LISP, SmallTalk, C# with LINQ and lose sight of the extensional meaning of your code as it's intractable intertwined its metareflective representation, pointer identities, the uncheckable exceptions it might throw, the interact with macros and syntax extensions which may transform it into something entirely different, etc.
The YNot Project
Jan 29, 2008 - 19:31 - Steps to Practicality
So, what's the roadmap here? Surely we can't expect real programmers to write code using Coq syntax. Do we define a nice mainstream-friendly language syntax, a translation into Coq for typechecking, and then a translation into some runtime form that is sound if the program successfully typechecked?
This special syntax for things with effects sucks and is counter to mainstream practice and intuition, e.g. "IO Int" and "Int->IO Int" versus "Int" and "Int->Int". We should implicitly wrap all computations in monads, e.g. have Int->Int compile to Int->M(Int) for a context-dependent monad M. And have "let x=a in bx" compile to "a >== \x->bx". Then, the syntax is regular for all kinds of effects, including none. You can use recursive monads as described in the "mdo" paper to separate out the total part of the language (the identity monad without recursion), the partial part (the recursive identity monad), and the various effectful parts -- exceptions, state threads, IO.
Also, it's untidy to force programmers to generate proofs of stuff and pass them to functions in addition to ordinary parameters, such as calling a function and passing it an integer and a proof that the integer is between 0 and n. Better to have a more expressive type system with proper subtyping, e.g. have the type of natural numbers less than n as a subtype of integers.
Arc is released
Jan 30, 2008 - 20:45 - ASCII Only?
I can sympathize with that decision, as it avoids bloating a language specification with the mess that is Unicode. For example, the specification of "valid unicode string" is too complex to express in most type systems, whereas a valid ASCII string is simply an array of ASCII characters. There's also the question of encoding: Do you expand all characters to 32 bits using UTC-32? Even then you still have to deal with combining characters, etc, so that one UTC-32 element doesn't coincide exactly with one character. Or do you use a UTF-8 style encoding, where the notion of 'character' is ill-defined, since a character may require multiple UTF-8 elements. Even if those problems didn't exist, Unicode doesn't define a mapping to glyphs, so yet another layer of translation is required to manipulate or display them.
Jan 31, 2008 - 01:33 - The mess that is Unicode...
I agree that international language support requires a standard with a large character set with far more complex handling of case, directionality, display, and order than ASCII.
But Unicode ought to have defined an unambiguous encoding rather than suggesting that different strings of e.g. UTC-32 codes can produce the same character point. IEEE 754 made a similar mistake in defining +0 and -0 as "distinct but equal" values. These specs are incompatible with extensional equality.
Also, Unicode doesn't define any sort of mapping to glyphs, which to me is one of the important things such standard ought to do.
Many useful things in the domain of real life can't be expressed in toto by the type systems of most languages.
This is true, but character strings are a simple enough domain that they could be precisely characterized by simple types. So, it really sucks that the standard defines it in such a way that this can't be done.
To answer your question: We store all text as UCS-16, and went to great effort a while back to move everything from ASCII to 16-bit Unicode, right before the consortium realized that 16 bits wasn't enough. Nowadays, Windows doesn't support UCS-32 nor UTF-8, so we're stuck with too 2X bloat yet still not enough bits to represent all characters directly.
Uniform naming
Feb 18, 2008 - 23:25 - Dereferencing operators
If all dereferencing-like operators are made postfix as in Pascal, then you can write: structure[index].field^.ptr^. This is clearly superior to to the messy combination of prefix and infix path operators in C++ that lead to ugly expressions like *((*a)->b[c].e)->f. That half of your proposal has much merit.
But, merging all such operations into a single symbol would have undesirable consequences. For example, in C++/Java/C#, a.b looks up b in the scope of a, while a[b] looks up b in the current scope. If you combined syntax, then the scope in which "b" is found in "a/b" would depend on whether a is a class or array. Context-dependent scoping would add undesirable complexity.
Pure, Declarative, and Constructive Arithmetic Relations
Mar 04, 2008 - 20:32 - Retreating from pure Horn clauses
That's the big question. It appears unlikely that pure Horn clauses will lead to something comparable in expressive power to Constraint Logic Programming. So, we need more power. But can we still avoid introspection (var/1) and unsoundness (Mercury modes)? Options include negation, cut, or moving to a more powerful base language with quantifiers and/or lambdas.
Also, what subset of problems would a more complete solution have, given the undecidability of the general case?
Variation of C's inline conditional
Mar 13, 2008 - 18:27 - Conditional
I think D exposed c?:b, meaning "c? c: b". In C++, you could also redefine "a||b" to mean "a? a: b", which agrees with the existing bool operator||(bool,bool), but naturally extends to any type that has a conversion to bool. Also, see Icon, which does some even more interesting stuff separating conditional failure from boolean values.
Species: making analytic functors practical for functional programming
Apr 24, 2008 - 19:11 - Confusing
The paper is written with frustrating vagueness, especially in its basic definitions.
The authors define a species as a certain kind of functor. But then they define species by means of vague analogy to data structures without stating clearly what the functor is. For example, we see the species 0 of "structures that can't be instantiated". What functor is this? The functor from finite sets to the empty set? Then we see the species epsilon, "a set with membership but no order on the elements". Aren't species supposed to be functors? If so, what's the functor? What does it mean for a species to be a set?
The definition of composition equates (F o G) with F(G[A]). What's A? What does it mean to equate a species to a structure? Or does the author mean (F o G)[A] = F(G[A])? What do the graphs with nodes and lines have to do with the underlying functors?
What does it mean to restrict a species to a cardinality n? Its functor maps finite sets to finite sets, so are we talking about restricting the cardinality of the sets in the functor's domain? Or its range? If the later, is the resulting functor partial?
The paper gets clearer later on (e.g. the definition of molecular species). But by then most of the non-expert audience is lost, which is unfortunate given the impressive breadth of the field that the paper covers and the translation into a generic-programming style implementation in Haskell at the end.
Apr 25, 2008 - 00:29 - Thanks!
Thanks for elaborating on the definitions. As I spend more time with this, I'd be happy to provide more feedback directly if you'd like (just email me: tim@epicgames.com).
There is an impressive amount of information in this paper. This is an important field for future data structure work, so I see a lot of value in presenting the material in a way that's accessible to technical people in other disciplines.
Automatic Patch-Based Exploit Generation
Apr 29, 2008 - 20:54 - Ouch
This is going to be a very painful revelation for a whole lot of software vendors.
Given limited and variable bandwidth, the only effective solution is for auto-patching systems to roll out patches in encrypted form, force all online users to shut down the program, and then roll out the decryption keys to users as they (re)start the program or operating system.
Valve's game distribution system, Steam, employs a similar system for streaming encrypted pre-release software, and then sending out the decryption keys on the release date, to enable staggered rollout without pre-release piracy.
On the importance of Turing completeness
Jun 11, 2008 - 18:34 - Why Turing Completeness?
There's a practical argument as well. It's usually much easier and more natural to express an algorithm using general recursion than primitive recursion or another sub-Turing language that guarantees termination. A trivial example is the factorial function; you can write it using primitive recursion, but the excercise is a lot more intuitive with general recursion.
Most programming in languages guaranteed to terminate reduces to writing programs in a very convoluted way so as to prove that they terminate. Turing-complete languages have no such burden, so you can just write your algorithm directly.
