Monday, June 30, 2014

Relating Event Calculus and Linear Logic, Part 2

Before I begin the technical content of this post, just a quick update: I gave my talk on "Generative Story Worlds as Linear Logic Programs" at INT in Milwaukee on the 17-18th, and I kept a travelogue, which includes notes from all the talks! The travelogue also details my time at AdaCamp, an "unconference" event on feminism & open technology, which I would strongly recommend to any women interested in those topics. I had a great time meeting people at both of these events!

--

Where I last left off in my discussion of the event calculus, I was discussing the idea of "encoding" vs. "embedding." In some discussions, these terms map onto the alternate terms "deep embedding" and "shallow embedding." If I understand them correctly, Alexiev's program gives a deep embedding of the event calculus: the core constructs of the calculus, fluents and events, are modeled as terms in the underlying logic program.

I was curious what it would mean to give a (shallow) embedding of the event calculus in linear logic, or indeed if this were even a possible task. On one level, EC and LL have entirely different purposes: the event calculus enables introspection on event specifications to determine intervals, and linear logic enables deduction that models reachability between state configurations.

On another level, EC and LL "do the same thing," i.e. model the same type of domain. They model evolving state, and they enable reasoning about propositions that hold in some configurations but not in others.

So the shallowest possible embedding seems like it would be one that populates a signature S for which S; D |- A iff D @ T |-_{EC} A @ T' for T < T'. In other words, the preorder given by logical entailment in linear logic should map to the preorder given by time in the event calculus.

To achieve this mapping, we would model
- fluents (time-indexed propositions) as linear propositions (or types)
- events as linear logic rules

An example domain (the same Yale Shooting Problem as from the previous post) is given below.

%% first take: fluents as types; events as rules.
%% "terminates E Ts" & "initiates E Is" |--> ruleE : Ts -o {Is}.

% fluents as types.
loaded : type.
empty : type.
dead : type.
alive : type.

% events as rules.
load : empty -o {loaded}.
unload : loaded -o {empty}.
shoot : loaded * alive -o {empty * dead}.
wait : {1}.

#trace * {empty * alive}.

This encoding does in some sense "simulate" the domain, but what it fails to take account for is control over "when things happen." The programmer of the domain has no means by which to dictate which rules proof search will select in which order, so unfortunately we cannot give an adequate encoding in this scheme.

We could try to fix it by explicitly modeling a happens E predicate and using it as a guard on our rules, but without explicitly modeling time as well, we still cannot control the order in which events take place.

Thus, take 2, in which events are data and fluents are boolean-indexed types. This version of the code uses a clock that gets incremented after each event "happens".

time : type.
z : time.
s : time -> time.
infty : time.

happens : event -> time -> type.
clock : time -> type.

bool : type.
tt : bool.
ff : bool.
loaded : bool -> type.
dead : bool -> type.


happens/load : clock T * happens load T * loaded ff  
                   -o {loaded tt * clock (s T)}.
happens/unload : clock T * happens unload T * loaded tt  
                   -o {loaded ff * clock (s T)}.
happens/shoot : clock T * happens shoot T * loaded tt * dead ff  
                    -o {clock (s T) * loaded ff * dead tt}.
happens/wait : clock T * happens wait T -o {clock (s T)}.

init : type.
- : init -o {loaded ff * dead ff}.

#trace * {init
* clock z * happens wait z * happens wait (s (s z)) * happens shoot

(s (s (s z))) * happens load (s z)}.

Results: this correctly models and simulates the domain, including control over event occurrences. But it fails a basic property of the event calculus: filling in the "gaps" between events logically by way of inertia. In this setup, we have to manually specify what happens at each and every timestep.

In the final take, I give a program that maintains events in a queue, which is centrally managed by a single rule:

happens : event -> time -> type.
holds : fluent -> type.

elist : type.
nil : elist.
cons : event -> time -> elist -> elist.

queue : elist -> type.

% domain things
load : event.
unload : event.
wait : event.
shoot : event.

loaded : fluent.
dead : fluent.

% clock management
tick : clock T -o {clock (s T)}.
stop : queue nil * clock _ -o {1}.

% key rule: "happens" management
happen : clock T * queue (cons E T' Es) * leq T' T
             -o {queue Es * happens E T}.

% action cause/effects
happens/load : happens load T -o {holds loaded * clock T}.
happens/unload : happens unload T * holds loaded -o {clock T}.
happens/shoot : happens shoot T * holds loaded -o {holds dead * clock T}.
happens/wait : happens wait T -o {clock T}.

init : type.
- : init -o {clock z
* queue (cons load z (cons wait (s (s z)) (cons shoot (s (s (s z))) nil)))
}.

#trace * {init}.

This program more or less gives me what I was initially after: a forward-chaining program that simulates a sequence of events specified in event calculus terms, filling in the gaps logically. It's still not *quite* an adequate encoding, because we don't have control over when the tick rule fires, so it might happen any time -- note that the happen rule invokes "less than or equal" to catch the cases where we've incremented past the next event's scheduled time. But we're able to keep them in the correct order nonetheless, via the queue as well as the synchronization technique of revoking the clock token until after the event's effects have been incurred.

After carrying out this iterative design experiment, I conclude that the distinction I should have made previously was not between encoding and embedding, but rather between a program analysis and execution. And perhaps that's what I've learned about the event calculus in the process: it isn't really for writing specifications we can run, just ones we can analyze. My proofs-as-programs training led me to faulty expectations.

Wednesday, June 4, 2014

Relating Event Calculus and Linear Logic

My research uses linear logic to describe actions, i.e. to write rules to answer the question "what happens if..." rather than "what holds if...". The desire to reason about actions follows a long line of endeavors to create AI with logic, where back then AI seemed to mean "computational agents that can take initiative, have goals, reason about their environments, and act upon them." AI-inspired logicians have been reasoning about actions for a long time, and one fruitful tool for doing so is the Event Calculus (Kowalski and Sergot 1986).

In this post, I'm going to describe my understanding of the relationship between Event Calculus and linear logic, by way of some Celf code I wrote last week after reading Vladimir Alexiev's "Event Calculus as a Linear Logic Program." Primarily, this is a post about that particular paper and the tinkering I did around it, but I'll also explain some of the context I've picked up on the way.

Background: Logics of Events and Situations


The Event Calculus (EC) and its predecessor, McCarthy's Situation Calculus (SC), both carry the notion of a fluent, or a proposition indexed by some contingency, i.e. rather than simply "P holds" for a proposition P, we say "P holds under [some circumstance]." They could perhaps, then, be formulated as modal logics in their own right, but instead, they were designed as logic programs, or collections of axioms embedded in first-order logic.

In SC, circumstances are situations, which are described by initial collections of predicates and sequences of actions which can then be "evaluated" on a situation. For instance, if on-table A is a predicate describing a block A being on the table, then {on-table A} is a situation, as is do(pickup A, {on-table A}).

In EC, circumstances are times, and an event is an action that happens at a time. Rather than "P holds," the basic judgment is "P holds at time T." Implication in such a logic seems to map naturally onto the idea of causation: "If 'rain' happens at time T, then 'wet' holds at time T." In The central problem is to determine which fluents hold in which time intervals.

EC can be seen as "dual" to SC in the sense that rather than specify actions in terms of the sets of propositions (situations) that hold between events, we reason independently about events and their effects. I'll be concentrating on EC for this post.

A simple (albeit needlessly violent) example domain is the Yale Shooting Problem, wherein available actions are load, reload, shoot, and wait, and fluents are loaded and alive, each of which can be negated to form another fluent. In EC we can state the domain as follows:

causes(load, loaded).
causes(unload, not loaded).
causes(shoot, not alive).
%% no rule: causes(wait, _).
exclusive(F, not F).
exclusive(F, F).

Given this specification, as well as an initial state and some happens(E,T)assumptions, we'd like to calculate the intervals in which all of the fluents and their negations hold. In brief terms, the event calculus does this by:
- first assuming that by default, fluents persist across time; then
- analyzing happens clauses to introduce new fluents and split intervals in which exclusive fluents hold.

The first of these steps is sometimes referred to as the commonsense law of inertia, i.e. that if causes(E, F) and happens(E, T), all fluents other than F remain at T in their state before T. In terms you may be more familiar with from settings like Separation Logic, this law is a frame property. Actually, AI people use similar language sometimes: by default, first-order logic has a frame problem that calculi like SC and EC must each somehow solve.

Linear logic (LL), on the other hand, supports "local reasoning" or "inertia" or a "frame property" on an intrinsic, or meta-logical, level: the property

P |- Q
--------------
P * R |- Q * R

is simply provable from the inference rules defining LL connectives (or, if we formulate the frame property on a more structural level, then it follows by the Cut principle for the logic).

But more fundamentally, LL and EC (or SC) aren't really in the same category: LL is a logic and EC is a logic program, or at least a specification of one that can be realized in many forms, as we'll see below. Both can be used to model domains of action and state change, but EC does a specific thing with that model, which is calculate intervals for fluents. Linear logic is agnostic to what you do with the encoding, but as a full and proper logic, it includes the ability to:

- use compositional reasoning (i.e. cut or substitution) on entailments and specifications;
- treat the model itself as a logic program, with agent's actions simulated directly as proof search;
- do a post-hoc analysis of the available proof search space;
- use a proof as a meaningful record of action sequences or trees, perhaps to be treated as a plan;
- control proof search with proof-theoretical tools like focusing to better understand the coarse structure by which proofs may be compared or equated to one another.

The Event Calculus as One of Many Linear Logic Programs


Alexiev's paper did not compare LL and EC at the level of languages for encoding actions. They instead noted that the canonical transcription of EC into a first-order logic language such as Prolog makes an unnecessary use of negation as failure (NAF), and that the use of a linear logic-based logic programming language could eliminate the need for this poorly-behaved construct.

The original "simplified event calculus" program in ordinary LP with NAF (emphasis mine) is:

holds(F,T) :-
  happens(Ei, Ti), initiates(Ei, F),
  happens(Et, Tt), terminates(Et, F),
  between(Ti, T, Tt), not broken(Ti, F, Tt).
broken(Ti, F, Tt) :-
  happens(E, T), between(Ti, T, Tt),
  (initiates(E, F1); terminates(E, F1)),
  exclusive(F, F1).
exclusive(F, F).

The essence of the linear variant, as I understand it, is that rather than directly calculating holds(F,T) by calculating unbroken intervals that initiate and terminate the fluent, we start off assuming fluents that hold at the start hold indefinitely, and then iteratively splits the intervals until quiescence. At that point, all of the intervals are at their longest while still consistent with the events known to have occurred. A snippet of the code follows (wherein G stands for a goal continuation):

happens E T G :- causes E Fs, record Fs T G.
record nil T G :- G.
record (F::Fs) T G :- insert F T (record Fs T G).
insert F2 T G :- exclusive F1 F2, int F1 T1 T2, between T1 T T2,
  (int F1 T1 T -o int F2 T T2 -o G).

holds F T G :- (int F T1 T2, between T1 T T2, top) & G.

Here "int F T1 T2" means "fluent F holds between times T1 and T2, as far as we currently know." I've highlighted the most essential rule, which propagates information about a fluent F2 caused by some event that happens at time T, by finding a fluent F1 that's exclusive with it, for which we currently believe it holds past time T, and replacing that knowledge with the new knowledge that F1 holds prior to T and F2 holds afterwards.

I found the continuation-passing style of this program distracting (especially in the last rule, where it necessitates using top and & to ignore irrelevant parts of the context but preserve them for the continuation). So did the author, it seems, who also opted to re-implement this program in Lygon. Unfortunately Lygon uses classical linear logic, which I find utterly unreadable, so I transcribed it into my LLPL of choice, Celf.

Take 0: Direct Transcription into Celf


The core of the program written using forward chaining in Celf is:

happens/causes 
: happens E T * causes E Fs -o {record Fs T}. 

record/nil 
: record fnil T -o {1}.
record/cons 
: record (fcons F Fs) T -o {insert F T * record Fs T}. 

split
: insert F2 T * exclusive F1 F2 * int F1 T1 T2 * between T1 T T2
        -o {int F1 T1 T * int F2 T T2}.

As far as I can tell, this is correct and deterministic, although the determinism is a little less obvious than in the continuation passing-style program. It makes clear to me that the fundamental way linearity "replaces" NAF is by using quiescence: the program doesn't stop until there aren't any more inconsistent intervals it can split.

At this point, though, I started experiencing some philosophical nagging: this program is in some ways an interpreter for EC, or perhaps an encoding, but I wondered if there were some way to make more fundamental use of linear logic and its very appealing frame property... i.e. if I could compile EC domains into linear logic in a way that would faithfully reproduce EC semantics.

I then recalled some discussion among the senior grad students in my group from several years ago about the idea of an encoding vs. an embedding (of an object-logic into a meta-logic). I took away an intuitive sense for the distinction -- an encoding represents object-logic widgets (e.g. connectives) as meta-logic data constructors and uses the meta-logic machinery for manipulating data to operate over them -- whereas an embedding directly interprets a widget in the object logic as a viable semantically-corresponding widget in the meta-logic. The most salient example of the latter I can think of is "higher-order abstract syntax," the idea from LF that we model hypotheticals/binding in object calculi with LF's built in notion of hypothetical context.

The line still feels blurry to me, though, and is even further muddled by this discussion about "shallow" vs "deep" embeddings in Coq, which on the face of it looks an awful lot like the distinction I was just making.

And somehow I also feel like this distinction is relevant to "interpretation vs. compilation," except that I'd expect encoding to map to compilation and embedding to map to interpretation, but while I was working on this Event Calculus stuff, I kept thinking to myself "I want an embedding/compilation, not an encoding/interpretation."

I think I can make sense of that mismatch as follows: in an interpreter, "translating" and running a program are the same step, but in a compiler, the output is source code you can inspect before running it. I wanted to know what "compiled" linear logic source code would look like in the image of translation from Event Calculus.

This post feels super long already, so I think I'm gonna toss my "compilation" experiments into a Part II.

Further Reading


Iliano Cervesato did a lot of work relating Event Calculus and linear logic back in the '90s. Some papers of his that are on my reading list include:

What the Event Calculus actually does, and how to do it efficiently
A general modal framework for the event calculus and its skeptical and credulous variants
Modal event calculus in Lolli

Saturday, May 24, 2014

Zoo of term assignments for linear sequent calculus

It's pretty common to give term assignments to logics presented through natural deduction. We usually call them "(typed) functional programming languages."

What's less common, though possible, is to give a term assignment to sequent representations. I've come across enough of these for linear logic in particular that I can make a little petting zoo for you.

Terms as concurrent processes, from Caires, Pfenning, and Toninho's Session Types:




Terms as stories (or more accurately, narrative structures), from Bosser, Cortieu, Forest, and Cavazza's "Structural Analysis of Narratives with the Coq Proof Assistant":

Terms as plans, from Cresswell, Smail, and Richardson's "Deductive Synthesis of Recursive Plans in Linear Logic" (paywall warning, but searching those terms also turns up a .ps.gz):




The latter two are especially interesting to me because (a) they elide term markers for certain rules (such as tensor left), and moreso because they don't really go into what reduction means in these settings: proofs-as-programs is powerful not just because of the surface syntax puns, but because it goes as deep as reduction semantics. The definition of program evaluation -- as in, running computation -- can be derived from "harmony" properties of the two rules defining each logical connective (more formally these properties are called local expansion and contraction). A similar idea is demonstrated for session types and the Cut rule.

Maybe there's nothing interesting to say there in the context of stories and plans, but I'd be surprised -- especially if we start talking about equivalence rather than reduction.

Saturday, May 3, 2014

Paper draft: Generative Story Worlds as Linear Logic Programs

I'm pleased to announce that João Ferreira, Anne-Gwenn Bosser, and I have had a paper accepted at (INT)7! I've made a draft available (7 page pdf), though some parts may read a bit strangely with reference to our prior work, since the original submission was anonymized. (This was my first experience with double-blind reviewing, and, well, I suppose all current data places me in favor. ;) )

Abstract:

Linear logic programming languages have been identified in prior work as viable for specifying stories and analyzing their causal structure. We investigate the use of such a language for specifying story worlds, or settings where generalized narrative actions have uniform effects (not specific to a particular set of characters or setting elements), which may create emergent behavior through feedback loops.
We show a sizable example of a story world specified in the language Celf and discuss its interpretation as a story-generating program, a simulation, and an interactive narrative. Further, we show that the causal analysis tools available by virtue of using a proof-theoretic language for specification can assist the author in reasoning about the structure and consequences of emergent stories.

Thursday, December 19, 2013

Learning typed functional programming: obstacles & inroads

Yesterday, there was a discussion on a mailing list I'm on about a perception of gender diversity problems in the communities around functional programming, type theory, and programming language theory, even relative to other areas of computer science. After some speculation about education barriers and community approachability, I decided to conduct an informal survey on Twitter:

You can read several of the (filtering for sarcasm/criticisms of languages) collected responses on Storify. I left "typed", "functional", and the combination thereof intentionally ambiguous because I was interested in how people interpreted those words as well as their reactions to whatever programming constructs they associated with them.

Because I promised not to argue with anyone who replied, a bunch of responses have been tumbling around in my head about how different my experiences have been from the ones described here. In general, I agree about the cultural tendencies and have observed plenty of that myself.

But I think what a lot of the non-people-focused responses seem to be telling me is that we're doing a terrible job of advertising, explaining, and demonstrating what typed-functional languages do, and especially what types are good for.

The reason I'm excited about types now, 8 or 9 years after using a Hindley-Milner type-inferred language for the first time, is that there's a correspondence with logic that gives rise to all kinds of useful and fascinating research. (By the way, this is why that one response saying "I have a logic brain but not maths" kind of broke my heart!) But if I think back to why learning ML felt like a godsend after a few years of Java, C, and C++, I remember a few different things:

- SML/NJ had a REPL. I could experiment with tiny pieces of the language and the code I was trying to write before putting them all together in a big file with a top-level entry point.

- Signatures (declarations of new types and functions) were separate from modules (implementations). I could think about the interface I wanted to program to without actually writing the code to do it.

- Algebraic/inductive datatypes and pattern matching. Ok, so this is secretly very related to Curry-Howard and the things that excite me about types now, but at the time the ability to write

datatype 'a tree = Leaf | Node of 'a * 'a tree * 'a tree

...instead of implementing a collection of functions to do pointer or field manipulation, and then immediately being able to write traversal & search functions (again without manipulating pointers or fields), just felt like a huge practical advantage. An entire data structure definition in one line! These building blocks -- type variables, recursive definitions, disjunction (|) and tupling (*) felt like they could make Lego-like structures which were so much more tangible to me than the way I would encode the same thing in C or Java. (Later I would learn that this is the difference between positive and negative types, both of which have their uses, but half of which are missing or at least highly impoverished in most OO settings.)

- Highly related to the above point, the ability to "make illegal states unrepresentable", & thus not have to write code to handle ill-formed cases when deconstructing data passed in to a function.

- The thing Tim said:


Over a couple of years, I found that I was less drawn to the "safety net" features of type systems and more to their utility as design tools. In other words, I found myself discovering that types let you carve out linguistic forms that map directly onto your problem rather than maintaining in your head whatever subset of the language you're using actually corresponds to the problem domain. A super basic example of this is just the ability to create a finite enumeration of, say, colors, rather than using integers, a subset of which you're using to represent different colors. But this goes way beyond base types. I think of every program I write as coming with its own DSL, and I want the language I'm using to support me programming in that DSL as directly as possible, with a type-checker to let me know when I've stepped outside my design constraints. Dependent types especially have a lot to offer in that arena.

Anyway, the point of this is not to tell folks that have struggled with learning type systems that you're wrong or that you just didn't get it. I hear "this was too hard for me" and then that you can write web code in JavaScript or a raytracer in C, and that boggles me, because that's so much harder for me. So I thought it might be elucidating to share my experience, which seems very different from yours.

I'm also directing some frustration at educators (including myself) for being apparently so bad at getting these kinds of things across, and at adapting typed languages & research projects to the needs of people who aren't already entrenched in them, that any programmer could think they are not smart enough to use them. Personally, I don't feel smart enough to do without them.

Wednesday, December 11, 2013

Post-proposal reading list

I passed my thesis proposal!

As I've spoken to more and more people about my work, I'm learning about a bunch of exciting related systems, and now I have a big pile of reading to do! Here's what's currently on my stack:

The Oz Project: Bob Harper and Roger Dannenberg brought to my attention a defunct CMU interactive fiction project called Oz, which efforts had some brief extra-academic fame through the Façade game. I've printed out the following papers which seem most relevant to my work:

Hap: A Reactive, Adaptive Architecture for Agents
A. Bryan Loyall and Joseph Bates. Technical Report CMU-CS-91-147, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, June 1991.

An Architecture for Action, Emotion, and Social Behavior
Joseph Bates, A. Bryan Loyall, and W. Scott Reilly. Techical Report CMU-CS-92-144, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. May 1992. Also appearing in Artificial Social Systems: Fourth European Workshop on Modeling Autonomous Agents in a Multi-Agent World, Springer-Verlag, Berlin, 1994.


Adam Smith's work: I met Mike Treanor when he interviewed at CMU, and he pointed me in the direction of a few of his colleagues at UCSC working on logic programming for games prototyping, specifically Adam Smith. It turns out that his advisor was Michael Mateas, one of the members of the Oz project. I'm reading:

Bits of his dissertation, Mechanizing Exploratory Game Design

Answer Set Programming for Procedural Content Generation: A Design Space Approach

LUDOCORE: A Logical Game Engine for Modeling Videogames

Towards Knowledge-Oriented Creativity Support in Game Design


Kim Dung Dang's work: My external committee member Gwenn Bosser recommended to me the recent thesis work of Kim Dung Dang, who is also using linear logic for games, specifically interactive storytelling applications. I'm reading:

Kim Dung Dang, Ronan Champagnat, Michel Augeraud: A Methodology to Validate Interactive Storytelling Scenarios in Linear Logic. T. Edutainment 10: 53-82 (2013)

Kim Dung Dang, Steve Hoffmann, Ronan Champagnat, Ulrike Spierling: How Authors Benefit from Linear Logic in the Authoring Process of Interactive Storyworlds. ICIDS 2011: 249-260

Saturday, August 31, 2013

Some classes of effectful programs

I'm going to do something unusual for this blog and talk about functional programming for a bit.

Jessica Kerr wrote a blog post about the concept of idempotence as it's used in math and programming, and I decided I wanted to lay the ideas out in more familiar terms to me.

So, consider a pure mini-ML with functions, products, numbers, lists, polymorphism, and let-binding, but no references (or other effects -- for the duration of this post I'm going to limit my notion of "effect" to references into a store). This is Language 1 (L1). Now consider adding references and sequencing to this language (i.e. new expressions are ref v, r := v, !r, and (e; e)). This is Language 2 (L2).

The operational semantics of Language 2 includes a store mapping references to values, for which I'll use the metavariable S. In addition to giving types T -> T' to L2 functions, we can give them store specifications S -o S' (using the linear implication symbol -o suggestively but not especially formally) to suggest how they might modify it; e.g. a function f(r) = r := 3 has store spec (arg L * (L |-> V) -o (L |-> 3)) (saying that if the location L referred to by the argument had value V before the function was called, it has value 3 after the function is called).

Here are some classes of Language 2 programs:

1. Effectless. 

This is the subset of Language 2 programs that are also Language 1 programs.

f : 'a -> 'b 
meeting spec S -o S (keeps the store the same)
e.g.: f(x) = x

This is sometimes called "referentially transparent".

2. Observably effectless.

f : 'a -> 'b
meeting spec S -o S' (might change the store)
where f(x) = (f(x); f(x))

but always returns the same output for the same input.

e.g.:
let r = ref nil in 
f(x) = (r := (x::!r); x) 

The function might have some internal effect, but the caller can't observe that based on the returned values. I'd argue this might be a more useful class for the term "referentially transparent", since "effectless" is a much better term for class 1, but honestly I don't find the former term informative at all and would rather do away with it.

3. Effectively idempotent.

f : 'a -> unit
meeting spec S -o S'
where f(x) ~ (f(x); f(x))

...where ~ is a notion of equivalence on the store; i.e., if we were write the store explicitly and label its state transitions with function calls:

S --f(x)--o S' --f(x)--o ... --f(x)--o S'

Calling f(x) once can change the store, but calling it an arbitrary number of subsequent times won't induce subsequent changes.

In this case, we don't care about the return value (made it of unit type to emphasize this point), but rather the effects of calling the function. The example I gave for "observably effectless" is not effectively idempotent.

I believe this is the concept that programmers who work across multiple APIs care about when they talk about "idempotence" -- the side-effects of such functions might be sending an email or posting to a user's timeline, which is definitely an effect on the world, but probably not one that you want to repeat for every impatient click on a "submit" button.

4. Mathematically idempotent

This is a property that doesn't have anything to do with effects -- we can talk about it for L1 programs or L2 programs, it doesn't matter. We don't care whether or not it has an effect on the store.

f : 'a -> 'b
f(x) = f(f(x))
e.g.: f(x) = 3

What class 4 has to do with class 3

We can compile L2 programs to L1 programs by reifying the store and location set in L2's semantics as a value in L1 (anything that can represent a mapping from locations to values). 

If we're interested in the class 3 functions of L2, we just need to consider programs of the form

(f(x); f(x))
for
f : 'a -> unit.

If st is our distinguished store type,
f becomes f' : 'a * st -> st
and (f(x); f(x)) in a store reified as s becomes
f' (x, f'(x, s))

Now the property defining class 3 can be translated as:
f(x, s) = f(x, f(x, s))

...which looks, modulo the extra argument, just like the definition in class 4.