More Sophisticated Encoding of S-exprs into Symbols.




More Sophisticated Encoding of S-exprs into Symbols
Encoding of S-expressions into symbols is frequent topic of discussion in this blog. It is motivated by the idea that names should be equally flexible as S-expressions. (Actually, it might be the best if not only every symbol is S-expression, but also every S-expression is the symbol. OK, it is almost mystical statement.)

Trivial encoding was demonstrated in last post: S-expression is encoded in symbol which has exactly same characters as printed representation of S-expression. For instance, S-expression (+ a b) is encoded into symbol (+ a b). This trivial encoding has some limitations. For instance, if we look at symbol

((+ a b) c)

and try to decode it, we'll find that it cannot be determined whether (+ a b) is S-expression, or it is symbol. In some Lisp dialects (Common Lisp, Picolisp, ISLisp) there is no such problem. Symbol (+ a b) is written as |(+ a b)| so difference between S-expression and symbol is obvious. However, in these dialects, repeated encoding results in the symbols that grow exponentially.

In this post more sophisticated encoding, without that problem, is presented. Let us define encoding k, such that for every S-expression e, k(e) is defined as follows.
  1. If e is symbol, then k(e) = e;
  2. if e = (e1 ... en) then k(e) = [k(e1) k(e2) ... k(en)];
Semi-colon is part of the symbol. For instance

k(a) = a;
k((a b)) = [k(a) k(b)]; = [a;b;];

If only codes of the S-expressions contain characters [, ] and ;, then encoding k is injection, i.e. for two S-expressions e and f,

k( e ) = k( f ) => e = f.

Furthermore, there is no exponential explosion in case of multiple encoding. For instance,

k((a b)) = [a;b;];
k(k((a b))) = [a;b;];;
k(k(k((a b)))) = [a;b];;;

Here is implementation of encoding and decoding in R5RS Scheme.
(define (sexpr->string L)
  (string-append
     (if (symbol? L) (symbol->string L)
                     (string-append
                        "["
                        (apply string-append
                               (map sexpr->string L))
                        "]"))
     ";"))
     
(define (sexpr->symbol L)
  (string->symbol (sexpr->string L)))
  
(define (string->sexpr S)
  (let((S1 (substring S 0 (- (string-length S) 1))))
    (if (equal? (string-ref  S1 (- (string-length S1) 1)) #\])
      (let((substring-begin 1)
           (level 0)
           (result (list)))
        (do ((i 1 (+ i 1)))
            ((= i (string-length S1)) result)
          (if (and (= level 0)
                   (equal? (string-ref S1 i) #\;)
                   (not (equal? (string-ref S1 (+ i 1)) #\;)))
            (begin
              (set! result
                    (append result
                            (list (string->sexpr
                                    (substring S1
                                               substring-begin
                                               (+ i 1))))))
              (set! substring-begin (+ i 1))))
          (cond ((equal? (string-ref S1 i) #\[)
                 (set! level (+ level 1)))
                ((equal? (string-ref S1 i) #\])
                 (set! level (- level 1))))))
      (string->symbol (substring S 0 (- (string-length S) 1))))))
      
(define (symbol->sexpr s)
  (string->sexpr (symbol->string s)))

-------------------------

> (sexpr->symbol (quote (a b)))
|[a;b;];|
> (define s2 (sexpr->symbol (quote (|[a;b;];| c))))
> s2
|[[a;b;];;c;];|
> (symbol->sexpr s2)
(|[a;b;];| c)

No comments:

Post a Comment