MAR
14
2004

Concept for a hybrid static-/dynamically typed language

I am watching the static vs dynamic typing wars with some curiosity. On the one hand, I can't
understand how to write any large application without the help of static typing. The lack of information
in the code, especially the imprecise and fuzzy specification of APIs, reduces the confidence
that my code will work in all situations. It also does not fit my usual coding style for large
programs and applications: I tend code for days, weeks or even months until I have a usable state,
without executing the code even once. I RELY on the compiler's ability to find all typos during that time.


On the other hand, I see that there are many people who prefer dynamic languages. Most of them have
a write-a-little/test-a-little style, which I know from writing JSPs, so I can understand the style
at least somewhat.



I think I found a very simple concept to allow dynamic typing in a Java-like statically typed language.
The following examples are based on Java, but with two additional features:

  1. Everything is an Object. Java 1.5 has auto-boxing, but that's not enough. For instance basic
    math operations are not supported for the Number classes.
  2. Support for fully dynamic method dispatching, aka multi-methods: if a method is overloaded, the actual
    type of the arguments is used and not the type of the reference. For example
    void printMe(Object o) {
    	System.out.println("I am an Object.");
    }
    
    void printMe(String s) {
    	System.out.println("I am a String.");
    }
    
    // ....
    
    String s = "Some string";
    Object o = s;
    printMe(o);
    

    would return "I am an Object." in Java, but "I am a String." with multi-method support.




Three following steps are needed to add Python- and Ruby-like dynamic typing in such a language:


1. Auto-Variable Declaration

The first step is to allow implicit declaration of local variables. If a value is assigned
to an undeclared variable, it is automatically declared with the assigned type. The target
of a foreach statement can be auto-declared in the same way:

void printStringArray(String strs[]) {
	i = 0;
	for (s: strs) {
		System.out.println("String number "+i+" is: "+s);
		i++;
	}
}

There are two limitations: after the assignment the type can't be changed anymore, and the
declaration is only valid within the scope of the first assignment. Thus following functions are
not allowed:

String errorFunction1() {
	i = 0;
	i = "Some string"; // error, i is already int
	return i;
}

int errorFunction2(bool b) {
	if (b) {
		i = 10;
	}
	else {
		i = 6;
	}
	return i; // error: i is not defined in this scope
}

Implicit local variables should not be on by default, but be enabled either using a compiler
switch or by a short declaration in the compilation unit. It is a trade-off between type safety
(or in other words: letting your compiler check that you are using your types correctly) and
being too lazy to declare variables.




2. The any type

The key concept for dynamic typing is the any type. It is a reference to Object that, unlike a
regular Object reference, disables all checks for member fields and implicit casts at compile time
and executes them at runtime instead. In a JavaVM this can be implemented using Java's reflection
APIs. Multi-methods are important for all function invocations with a any reference as argument, because otherwise the least specific overloaded function would be called.

The first function with the any type for its local variables looks like:

void printStringArray(String strs[]) {
	any i = 0;
	for (any s: strs) {
		System.out.println("String number "+i+" is: "+s);
		i++;
	}
}

Any could be used anywhere, so if you want to obscure the function you could use any
for the argument:

void printStringArray(any strs) {
	any i = 0;
	for (any s: strs) {
		System.out.println("String number "+i+" is: "+s);
		i++;
	}
}

If the 'strs' argument is not a Iterable (the type needed for Java's foreach statement),
the function will abort with an exception in the for loop.

(Sidenote: In this example the any variant has
the advantage that it can print any Object in any Iterable, but that's only because of the more
restrictive original example - you could achieve the same with static typing by using Iterable for 'strs' and Object for 's')


Any can change the type at any time, so the following is legal:

String validFunction1() {
	any i;
	i = 0;
	i = "Some string";
	return i;
}

int validFunction2(bool b) {
	any i;
	if (b) {
		i = 10;
	}
	else {
		i = 6;
	}
	return i;
}

Any could also be used to write C++ template-like functions (but IMHO, if you need it for that purpose in a language with multi-methods and operator overloading, it only shows that you aren't using interfaces correctly).



3. Combine them

Step 3 does the obvious and combines both: if a value is assigned to an undeclared, local variable
then this variable is declared as any. Because it can have any type, it's no problem
the declare it for the whole method. Here the three example functions using implicit auto-any-declaration:

void printStringArray(String strs[]) {
	i = 0;
	for (s: strs) {
		System.out.println("String number "+i+" is: "+s);
		i++;
	}
}

String validFunction1() {
	i = 0;
	i = "Some string";
	return i;
}

int validFunction2(bool b) {
	if (b) {
		i = 10;
	}
	else {
		i = 6;
	}
	return i;
}

Like step 1 it should be enabled optionally. You would have the choice between real static typing (like in Java or C++), step 1's auto-declaration and step 3's auto-any-declaration.


Here's the question to the dynamic typing zealots: would that be enough to make you happy? :)

Comments

I'm not a dynamic typing zealot, but with some PHP experience I can surily see it's value and I think you have achieved it. However, the most compelling argument /against/ dynamic typing (which you identified yourself) is still not achieved because of your auto-any-declaration. If you make a typo in a variable name, a new any variable will be created and so you can't rely on your compiler to catch your typo's anymore...

--
Arend van Beelen jr.
http://www.liacs.nl/~dvbeelen


By arendjr at Sun, 03/14/2004 - 20:25

But I guess that's exactly why you want to make it optional. Guess I should read better next time :)

--
Arend van Beelen jr.
http://www.liacs.nl/~dvbeelen


By arendjr at Sun, 03/14/2004 - 20:28

But I guess that’s exactly why you want to make it optional.

Yes, but i wonder what's the best way to make it optional... ideas include a different file extension or compiler flags (least amount of typing), a declaration in the file, declaration per method or combining these options.


By tjansen at Sun, 03/14/2004 - 21:07

A declaration in the file is the best trade-off, I guess. It's not too much typing and it won't pollute your method declarations.
Using a different file extension is also a possibility, but it's an ugly hack I think. Compiler flags could make copying a file between different projects harder as it could result in ugly exceptions in Makefiles.

--
Arend van Beelen jr.
http://www.liacs.nl/~dvbeelen


By arendjr at Sun, 03/14/2004 - 21:48

I think the interesting things about dynamic langagues is not the fact that a variable can change its type over its lifetime. I would call this an unfortunate side-effect as it is not safe from a programming point of view. It is just like re-using the same variable name for different purposes: unwise, because one get lost about what the variable do.

The real good thing about dynamic langagues is that you can substitue later on objects that share the same properties as the original one.

This is what C++ is trying to achieve with inheritance, which is not always the right way.

Example in C++:

In C++, to print the generic color of a class fruit, you would do that:

class Fruit {
public:
	virtual string color() { return "no color"; }
}

class Banana {
public:
	virtual string color() { return "yellow"; }
}

class Orange {
public:
	virtual string color() { return "orange"; }
}

void printFruitColor( Fruit f ) {
	printf("Color %s\n", f.color() );
}

This is fine but when you look at printFruitColor(), you don't need to specify the type of argument it accepts. You only need to specify that it accepts an object containing a color() method returning a string. You can achieve that with template [but they have their own issues (like horrible ununderstandable error messages].

One way to do that in dynamic languages is to allow the argument to be a type any and then check at runtime that the object f contains a method color(). That's how python did it. It's sort of the lazy way. The advantage over C++ is that any python object can be passed, inheriting from Fruit or not, as long as it has a method color().

Example in python:

def printFruitColor( f ) {
	print "Color : ", f.color()
}

class Fruit:
	def color(self): return "no color"

class Banana:
	def color(self): return "yellow"

class Orange( Fruit ):
	def color( self ): return "orange"


You can use all the types with the printFruitColor:

>>> printFruitColor( Fruit() )
no color
>>> printFruitColor( Banana() )
yellow
>>> printFruitColor( Orange() )
orange

You can try to add new supported types to the function:

>>> printFruitColor( Box() )
Traceback (most recent call last):
File "", line 1, in ?
File "", line 1, in printFruitColor
AttributeError: Box instance has no attribute 'color'

This was the most feared runtime exeception. But you can use dynamic features to avoid it:

>>> def boxColor(self): return "box color"
...
>>> Box.__dict__['color'] = boxColor
>>>
>>> printFruitColor( Box() )
box color

The problem with this approach is that the type checking is done at runtime. pychecker tries to circumvent this but it is still easy to get a runtime exception, which is just as annoying as a segfault.

The right way to handle the problem is the one used by Objective Caml. The langage is strongly statically typed without any strict variable definition. The fact that the language is developed by researchers shows up: the ocaml compiler will check all the arguments of all the functions and deduce what they accept:

# let printFruitColor( fruit ) = print_endline fruit#color ;;
val printFruitColor : < color : string; .. > -> unit = <fun>

The meaning is that printFruitColor is a function taking as argument an object who contains a method color returning a string. The whold function returns unit which is the equivalent of void in ocaml.

# class fruit =
        object
                method color = "no color"
        end ;;
class fruit : object method color : string end

We have an object with a method color returning a string. We can use it with our function:

  printFruitColor( new fruit ) ;;
no color

Let's define more objects:

# class orange =
object
method color = "orange"
end ;;
class orange : object method color : string end
# printFruitColor( new orange );;
orange

The big difference is here:

# class box =
  object
        method foo = "foo"
  end ;;
class box : object method foo : string end
# printFruitColor( new box ) ;;
This expression has type box = < foo : string > but is here used with type
  < color : string; .. >
Only the second object type has a method color

Nice and explicit error computed at compile time. This is in my opinion the way to go instead of dynamnic languages. It resembles template a lot, but it is more subtile and powerful.


By pfremy at Wed, 03/17/2004 - 15:57

The real good thing about dynamic langagues is that you can substitue later on objects that share the same properties as the original one. This is what C++ is trying to achieve with inheritance, which is not always the right way.

IMHO that's only one half of the problems. The other one is the contract of the function "string color()". Where is defined what "color()" does? Can it return any kind of string? Does it need to be english? Can it return null or the empty string? Can two classes return the same color? What happens when a library has a function with the same name and signature, how do you know whether it is designed to be used with printFruitColor().



There is no canonical place to look for the contract, the sematics, of color(). At best you could describe color() in printFruitColor(), but that would be hard to find when you look at one of the fruits and it does not help much when more than one function is using color().



In the C++ example you can define the exact contract for the color() function in the Fruit class, and when you look at the code two years later you will still be able to find it.



Another issue is refactoring: when you need to change the Fruit::color() contract, for whatever reason, it is easy to find all affected methods and modify them. With 'duck typing' it is more difficult (but not impossible though).



That's what I meant with imprecise/fuzzy specification of APIs.


By tjansen at Wed, 03/17/2004 - 17:28

This point of discussion reaches the programming style. If you want langagues that enforces contracts, you should look either at Eiffel for its pre/post-condiditions (you already posted a comment on this) and at Ada.




Ada, for what I have seen has a very strong contract enforcing in its syntax. The compiler will catch many errors. When you have a look at Ada code, you see how C++ is poorly specified. For example, you can specify that an argument to a function is an int within a given interval and have some kind of typed int that can not be exchanged with normal int.




Ada is very good for embedded langugaes and for critical applications where you have to prove that your appplication won't crash the train that it is driving.




However, such a strong contract comes with a price: the code is syntactically heavy. Just changing one parameter type is a complicated operation.




OCaml is at the other side of the spectrum, with a compiler enforcing just the minimum requirements of the variable so that they can work.




Where is defined what “color()


By pfremy at Sun, 03/21/2004 - 13:24