creating vs changing a symbol

Just decided to write down a few words on the topic of differentiating definitions that create a new symbol bound to a value, from ones that change said value. Meaning there are 2 kinds of assignments, as in our minds. The reason for writing now is I hasardly stepped on a series of articles questioning the global/local issue for variable creation (esp: what should be the default?). See pointers below.
Both topics are related, as we'll see.

human semantics

My first & main reason for introducing such a difference is not practicle at all. Instead it is to allow correctly expressing one's intent; this benefits the author at writing time, at reading time, and anyone else reading or maintaining or simply exploring the code. The point is to have the program correctly mirror the model. The human semantic difference is obvious: it should appear obviously in a program.

Let us say a creation is expressed by ':', and a change by '::'.

a: 1            # creation
a:: 2           # change

naughty beasties

This has (as people noticed, I first was not aware of this) some advantages in terms of avoiding common bugs:

a: 1
a:: 2           # correct, a is known
a: 3            # incorrect, a is already defined
b::4            # incorrect, b is unknown

Actually, I guess it may be of a great benefit because these bugs often are of the naughtiest kind. When we change a symbol value instead of creating a new one as intended, or conversely create a new one while we intended to change another, then such errors are happily and silently welcome by any mainstream language.
That's incredible! for me. It is not the same action; it should not even be possible to mistake one for another. As if the compiler had to guess whether one really wants to overwrite a file, or rather made a mistake and actually wants to create a new one (for sure, there are languages allowing that, humans are capable of worse).
Moreover, such bugs will have impredictable consequences that may happen far in "space" (inside the code), in time, in the mental world, too, and mainly: reason why they are sometimes so hard to find.

Laurence Tratt wrote about a similar issue:

« When one of my programs mysteriously failed, I could not work out why. Eventually I realised that one of my label definitions had spelt "looop" (three O's instead of two) instead of "loop", so my loop had branched back to the previous loop in the file. Spotting that took me a couple of days.
Later, I realised that most programming errors fit into two broad categories: the obvious and the subtle. Obvious errors are those whose source can be easily pinpointed (even if fixing the problem takes a while). The subtle are typically those where cause and effect are separated…
»

where's my var?

Another nice side-effect may be ending the variable scoping war. Unless my reasoning is wrong, which might well be the case for otherwise I do not understand how such discussions have last for so long. Right, people cannot agree on the question whether a variable created inside a specific scope — eg a func — should be accessible locally in the said scope only, or globally instead. Both have advantages and issues, — see pointers. There is also a question of personal taste and style, obviously. Actually, when we step on a case where the opposite rule should apply, we tend to fix on this very issue.
On the other hand, I guess after a while programming in a given language, these questions may progressively disappear because we adapt our mental scheme & programming style. Most mainstream languages beeing on the "local" party, this will soon (if not already) be regarded as "normal" (if not "natural", ho ho!). But this solution is not better according to me, only more secure, and leads to adhoc hacks in code or even in the language ("global", "nonlocal", thingies).
I do not feel highly concerned because my preference would go to a language where "wild" free vars wandering around in a global scope would simply not exist, except for lexical sugar such as "write" for "console.write", or for constants. Wild & free is good for the programmer; not for the code.

Still, when approaching this question and the one of ':' vs '::', some interesting points appear. It seems to me that what people look for is such a rule set:

  • When reading or changing a symbol, it should first be searched locally, then inside higher level scopes up to the top one, then only issue an error.
  • When creating a symbol, it should belong to the current scope, only accessible to inner nested ones, if any. Very rare cases require a locally created var be accessible globally — and most belong to the category of smelly code elements ;-)

Well, it seems that's exactly what we get with the differenciation of both kinds of definitions (assignments): reading or changing require the symbol to exist, so that the action is performed in the closest scope; creating instead requires the symbol to be undefined so that it is simply put in the current scope. The debated issue is thus invented by the undifferenciation of "set" actions: the language cannot guess whether one intends to create or change a symbol, so that it cannot decide whether lookup should reach external scopes.
We can notice that reading and changing symbols are, on this aspect, closer actions to each other than changing is to creating: because they both require symbol lookup to be successful, so that they operate according to the same scoping rules.

values & things

It seems the create vs change distinction has numerous properties. A thread in the lua-users mailing list pointed at the common conceptual mismatch between the value vs thing distinction, on one hand, and mutable vs immutable types, on the other. Objects, in the conceptual model, are possibly shared things with individual identity: a character, a visual thing, a device, tool, or weapon, any thing or thing in the usual sense of the terms ; values instead describe, qualify other things, they are pure data, pieces of information: size, position, color…

What if a conceptual thing happens to be represented as simple, immutable data? What if, conversely, a value requires a complex piece of data to be fully defined, which type can only be mutable, like a simple 2d position? We are trapped. There are ways to cope with such situations as long as the programmer is aware of the semantic distortion, implements needed interfaces (eg the so-called "value-thing" design pattern, and deals with particuliar attention.

Still, what cannot be simply done is changing data denoted by simple thing variables. Let us imagine points in 1-dimensional space:

points = (p1:1, p2:2, p3:3)
...
# move points to next location
d = 3
for point in points
   point = point+d

This code will not move p1,p2,p3, instead create new points, actually new variables holding numbers. One reason is again there is no way in common dynamic languages to express the intention of changing existing variables, not creating new ones. But there is more needed in this case: instead of pointer reference, we would need [[symbolic_reference | symbolic reference TODO]]. When a variable is created with a special token saying it denote a referenced "thing", eg an thing keyword, whatever its data type, this variable would be referenced by a real symbol (think at a symbolik link) (as opposed to a pointer). Then, a later assignment of the form y : x would let y point to the same thing as x does, even if the said data happens to be a simple integer. A change to the data, denoted by the change sign "::", would affect both x & y.

Thus, we could write eg:

points : (thing p1:1, thing p2:2, thing p3:3)
...
# move points to next location
d = 3
for point in points
   point :: point+d

and actually change data adressed by the variables p1,p2,p3.

But the most important feature brought by this syntactic distinction, in the case of things, is about their evolution in time. How can one code the following in any mainstream language, even Object-Oriented?

Paulo : Student(MATH)
Clara.boyfriend : Paulo
Paulo :: Professor(MATH) 
assert (Clara.boyfriend == Paulo)

This fails in any language I know, because Clara's boyfriend remains the old object Paulo's state used to be; meaning the student. One cannot redefine a thing's state in common languages without creating a new object, and thus breaking any relation with other references, like here Clara.boyfriend.
This issue is due to two facts, actually:

  1. The absence of a distinct syntax allowing to redefine a thing's state without changing the thing's identity. Using :: for instance fulfills this need.
  2. Pointers are used as references, instead of true references independant of the value (the thing's state); which they are not since their id precisely is the value's address; so that if any change requires reallocation, then every relation is broken.

questioning an answer

If my reasoning is correct, then the point in fact was not about opposing local/global scopes for vars, rather about properly distinguishing in code two semantically distinct actions. Laurence Tratt ends his article as follows:

« The open question is this: why has it taken us, as a community, 50 years or more to define two simple scoping rules (assignment is local; nonlocal successively searches outer scopes) and one simple implementation technique (closures), and why have we taken so many wrong turns (in this article I haven't enumerated the silliest, such as dynamic scoping)? I suspect the answer, if it could be definitively uncovered, would give one a very interesting insight to the subject of computing in general. »

Well, my answer on this very issue is: "We could hardly find a right answer to a wrong question." Again, my reason for supporting this distinction is not such benefits, but a plain question of meaning; the benefits are nice side-effects.

links

And many more…

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License