Liblouis Table Specification

Opcodes

`space`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "space"
- for using $s wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for finding word boundaries (largesign, joinword, joinnum, contraction, lowword, sufword, prfword, begword, begmidword, midendword, endword, prepunc, postpunc, firstwordital, firstwordbold, firstwordunder, lastworditalbefore, lastwordboldbefore, lastwordunderbefore, lastworditalafter, lastwordboldafter, lastwordunderafter)
- for finding number boundaries (begnum)
- for dropping space (largesign, joinnum, joinword)

`punctuation`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "punctuation mark"
- for using $p wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for finding punctuation (prepunc, postpunc)
- for finding word boundaries (largesign, joinword, joinnum, contraction, sufword, prfword, begword, begmidword, midendword, endword)
- for finding number boundaries (begnum)

`digit`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "digit"
- for using $d wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for finding (the absense of) word boundaries (prepunc, postpunc, firstwordital, firstwordbold, firstwordunder, lastworditalbefore, lastwordboldbefore, lastwordunderbefore, lastworditalafter, lastwordboldafter, lastwordunderafter)
- for finding (the absense of) number boundaries (numsign, begnum, midnum, endnum, decpoint)
- for finding one-letter words and letters following a digit (letsign)
- for joining a word and a digit (joinnum)

`letter`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "letter"
- for using $l wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for finding (the absense of) word boundaries (repword, largesign, partword, sufword, prfword, begword, begmidword, midendword, endword, prepunc, postpunc, singleletterital, singleletterbold, singleletterunder, firstletterital, firstletterbold, firstletterunder, lastletterital, lastletterbold, lastletterunder, firstwordital, firstwordbold, firstwordunder, lastworditalbefore, lastwordboldbefore, lastwordunderbefore, lastworditalafter, lastwordboldafter, lastwordunderafter)
- for finding one-letter words and letters following a digit (letsign)
- for joining a word and a letter (joinword)

`lowercase`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as both "letter" and "lowercase"
- for using $u wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for finding the end of a block of uppercase letters within a word (endcaps)

`uppercase`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as both "letter" and "uppercase"
- for using $U wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for inserting capital signs (capsign, begcaps, endcaps)

`uplow`

associates first character with first dots (uppercase)
defines first character and first dots as "uppercase letter" (uppercase)
associates second character with last dots (lowercase)
defines second character and last dots as "lowercase letter" (lowercase)
associates letters as each other's case-counterparts
- for mapping uppercase character to lowercase dots when prefixed with a capital sign (capsign)
- for matching translation rules written in lowercase letters on input strings containing uppercase letters

lowercase f 124
uppercase F 1247
uplow Oo 1357,135
letter u 136
always foo 124-136
always OOf 136-1247 # translation rules with uppercase letters
                    # defined with uplow don't work

{
  "input": "foo",
  "output": "fu"
},
{
  "input": "FOO",
  "output": "FOO"
},
{
  "input": "fOO",
  "output": "fu"
},
{
  "input": "OOf",
  "output": "OOf"
}

`litdigit`

associates character with dots
- for replacing character with dots
  - has precedence over space, digit, punctuation, math, sign, letter, uppercase and lowercase
- for replacing dots with character during backward translation if they're part of a number (numsign)
  - has precedence over space, digit, punctuation, math, sign, letter, uppercase and lowercase
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "literary digit"
- for using $D wildcard in multipass scripts (correct, context, pass2, pass3, pass4)
- for joining a word with a literary digit (joinword)

letter a 1
litdigit 1 1
sign # 3456
numsign 3456

/* TODO: backward translation */
{
  "input": "#a",
  "output": "1"
},
{
  "input": "a",
  "output": "a"
}

`sign`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "sign (without special meaning)"
- for using $S wildcard in multipass scripts (correct, context, pass2, pass3, pass4)

`math`

associates character with dots
- for replacing character with dots
- for mapping dots to character in display phase if dots is a single-cell dot pattern
defines character (and dots if single-cell) as "mathematical symbol"
- for using $m wildcard in multipass scripts (correct, context, pass2, pass3, pass4)

`capsign`

defines dots as "capital sign"
- for inserting before a single uppercase letter (uppercase)
  
  lookbehind != uppercase && lookahead = ( uppercase !uppercase )
- for inserting before each uppercase letter if begcaps is not defined (begcaps)
  
  lookahead = uppercase
- for turning uppercase letters into their lowercase counterparts (uplow)

uplow Ff 1247,124
lowercase o 135
uppercase O 1357
uplow Uu 1367,136
punctuation , 6
capsign 6
always uu 1367

{
  "input": "Foo",
  "output": ",foo"
},
{
  "input": "FOO",
  "output": ",f,O,O"
},
{
  "input": "Fuu",
  "output": ",fU"
},
{
  "input": "FUU",
  "output": ",f,U"
},
{
  "input": "FuU",
  "output": ",fu,u"
},
{
  "input": "FUu",
  "output": ",f,U"
}

`begcaps`

defines dots as sign that announces a block of uppercase letters (uppercase)

lookbehind != uppercase && lookahead = ( uppercase uppercase )

lou_translateString.c::1051

`endcaps`

defines dots as sign that closes a block of uppercase letters within a word (uppercase, lowercase)

lookbehind = ( uppercase uppercase ) && lookahead = lowercase

letter f 124
uplow Oo 135
punctuation , 6
punctuation ' 3
begcaps 6-6
endcaps 6-3

{
  "input": "foo",
  "output": "foo"
},
{
  "input": "fOO",
  "output": "f,,oo"
},
{
  "input": "fOOo",
  "output": "f,,oo,'o"
}

lou_translateString.c::1063

`letsign`

defines dots as "letter sign"
- for inserting before a one-letter word
- for inserting between a digit and a letter, except in case of endnum (letter, digit, endnum)
- for inserting before a word that is also a contraction (contraction)

letter f 124
letter o 135
digit 0 356
punctuation ; 56
letsign 56

{
  "input": "f",
  "output": ";f"
},
{
  "input": "foo",
  "output": "foo"
},
{
  "input": "0foo",
  "output": "0;foo"
}

`noletsign`

inhibits the use of a letter sign when any of characters occur as a one-letter word or after a digit

letter f 124
letter o 135
punctuation ; 56
letsign 56
contraction foo
noletsign f

{
  "input": "f",
  "output": "f"
},
{
  "input": "foo",
  "output": ";foo"
}

lou_translateString.c::1035

`noletsignbefore`

inhibits the use of a letter sign when any of characters preceed a one-letter word

lou_translateString.c::1028

`noletsignafter`

inhibits the use of a letter sign when any of characters follow a one-letter word

letter f 124
punctuation ; 56
punctuation ( 12356
punctuation ) 23456
letsign 56
noletsignbefore (

{
  "input": "f",
  "output": ";f"
},
{
  "input": "(f",
  "output": "(f"
},
{
  "input": "f)",
  "output": ";f)"
}

lou_translateString.c::1041

`numsign`

defines dots as "number sign"
- for inserting before a number, i.e. a sequence of digit, decpoint and midnum (digit, decpoint, midnum)
- for finding numbers during backward translation

digit 0 356
sign # 3456
numsign 3456

{
  "input": "0",
  "output": "#0"
}

space \s 0
letter a 1
letter b 12
letter c 14
letter d 145
letter f 124
letter l 123
letter o 135
letter r 1235
letter u 136
letter z 1356
digit 0 356
digit 1 2
punctuation . 256
punctuation - 36
punctuation ; 56

TODO `compbrl`

TODO `comp6`

TODO `nocont`

TODO `replace`

bug?

lou_translateString.c::1907

`always`

matches characters
replaces matched characters with dots

include chardefs.cti
always bar 1356

{
  "input": "bar",
  "output": "z"
},
{
  "input": "foobar",
  "output": "fooz"
}

`begmidword`

matches characters when they are either at the beginning or in the middle of a word (space, punctuation, letter)

lookbehind = ( space | puntuation | letter ) && lookahead = letter
replaces matched characters with dots

lou_translateString.c::1446

`begnum`

matches characters when they are at the beginning of a number (space, punctuation, digit)

lookbehind = ( space | punctuation ) && lookahead = digit
replaces matched characters with dots

lou_translateString.c::1476

`begword`

matches characters when they are at the beginning of a word (space, punctuation, letter)

lookbehind = ( space | puntuation ) && lookahead = letter
replaces matched characters with dots

lou_translateString.c::1439

`contraction`

matches characters when they are a word (space, punctuation)

lookbehind = ( space | punctuation ) && lookahead = ( space | punctuation )
replaces each matched character with its associated dot patterns
inserts letter sign before word (letsign)

include chardefs.cti
letsign 56
word could 14-145 
contraction cd

{
  "input": "could",
  "output": "cd"
},
{
  "input": "cd",
  "output": ";cd"
}

`endnum`

matches characters when they are at the end of a number (digit)

lookbehind = digit
replaces matched characters with dots
inhibits the use of a letter sign (letsign)

letter t 2345
letter h 125
letter s 234
digit 5 15
punctuation ? 1456
punctuation ; 56
letsign 56
endnum th 1456

{
  "input": "th",
  "output": "th"
},
{
  "input": "5th",
  "output": "5?"
},
{
  "input": "5ths",
  "output": "5?s"
},
{
  "input": "5t",
  "output": "5;t"
}

`endword`

matches characters when they are at the end of a word (space, punctuation, letter)

lookbehind = letter && lookahead = ( space | punctuation )
replaces matched characters with dots

include chardefs.cti
endword oo 136

{
  "input": "foo",
  "output": "fu"
},
{
  "input": "foo ",
  "output": "fu "
},
{
 "input": "foo.",
  "output": "fu."
},
{
  "input": "foobar",
  "output": "foobar"
}

lou_translateString.c::1469

`joinnum`

matches characters when they are a word and a space and a digit follow (space, punctuation, digit)

lookbehind = ( space | puntuation ) && lookahead = ( space+ digit )
replaces matched characters with dots
drops space between characters and digit

include chardefs.cti
joinnum foo 124

{
  "input": "foo",
  "output": "foo"
},
{
  "input": "foo 0",
  "output": "f0"
}

`joinword`

matches characters when they are a word and a space and a letter follow (space, punctuation, letter, litdigit)

lookbehind = ( space | punctuation ) && lookahead = ( space+ ( letter | litdigit ) )
replaces matched characters with dots
drops space between characters and following letter

include chardefs.cti
joinword foo 124

{
  "input": "foo",
  "output": "foo"
},
{
  "input": "foo   bar",
  "output": "fbar"
}

`largesign`

matches characters
replaces matched characters with dots
drops space between adjacent largesign words (space, punctuation, letter)

lookbehind = ( ( space | punctuation ) largesign space+ ) && lookahead != letter

include chardefs.cti
largesign foo 124
largesign bar 12

{
  "input": "foobar",
  "output": "fb"
},
{
  "input": "foo bar",
  "output": "fb"
},
{
  "input": "foo barr",
  "output": "f br"
}

`lowword`

matches characters when they are a word preceded and followed by whitespace (space)

lookbehind = space && lookahead = space
replaces matched characters with dots

include chardefs.cti
lowword foo 124-136

{
  "input": "foo",
  "output": "fu"
},
{
  "input": "foo ",
  "output": "fu "
},
{
 "input": "foo.",
  "output": "foo."
}

lou_translateString.c::1408

`midendword`

matches characters when they are either in the middle or at the end of a word (space, punctuation, letter)

lookbehind = letter && lookahead = ( space | punctuation | letter )
replaces matched characters with dots

lou_translateString.c::1461

`midnum`

matches characters when they are in the middle of a number (digit)

lookbehind = digit && lookahead = digit
replaces matched characters with dots

digit 0 245
punctuation . 256
punctuation , 6
sign # 3456
numsign 3456
midnum , 256

{
  "input": "0,0",
  "output": "#0.0"
},
{
  "input": "0.0",
  "output": "#0.#0"
}

`midword`

matches characters when they are in the middle of a word (letter)

lookbehind = letter && lookahead = letter
replaces matched characters with dots

lou_translateString.c::1454

`nocross`

matches characters when they do not cross syllable boundaries
replaces matched characters with dots

foo1bar

include chardefs.cti
include hyph.dic
nocross foob 124-136-12
nocross bar 12

{
  "input": "foobar",
  "output": "foob"
}

lou_translateString.c::481

`partword`

matches characters when they are part of a word but not the whole word (letter)

lookbehind = letter || lookahead = letter
replaces matched characters with dots

include chardefs.cti
partword oo 136

{
  "input": "oo",
  "output": "oo"
},
{
  "input": "foo",
  "output": "fu"
},
{
  "input": "foobar",
  "output": "fubar"
}

lou_translateString.c::1378

`postpunc`

matches characters when they are part of punctuation at the end of a word (space, punctuation, letter, digit)

characters(1) = punctuation && lookbehind = ( ( letter | digit ) ( !space )* ) && lookahead != letter
replaces matched characters with dots

letter f 124
letter o 135
sign < 126
sign > 345
sign # 3456
punctuation ' 5
punctuation ( 12356
punctuation ) 23456
prepunc ( 5-126
postpunc ) 5-345

{
  "input": "(foo)",
  "output": "'<foo'>"
},
{
  "input": "( foo )",
  "output": "( foo )"
},
{
  "input": "(#foo)",
  "output": "'<#foo'>"
},
{
  "input": "#(foo)",
  "output": "#'<foo'>"
}

include chardefs.cti
prepunc .foo 124-136

{
  "input": ".foobar",
  "output": "fubar"
}

lou_translateString.c::1513

`prepunc`

matches characters when they are part of punctuation at the beginning of a word (space, punctuation, letter, digit)

characters(1) = punctuation && lookbehind != letter && lookahead = ( ( !space )* ( letter | digit ) )
replaces matched characters with dots

lou_translateString.c::1498

`prfword`

matches characters when they are either a word or at the end of a word (space, punctuation, letter)

lookbehind = ( space | puntuation | letter ) && lookahead = ( space | punctuation )
replaces matched characters with dots

lou_translateString.c::1431

`repeated`

matches characters
replaces matched characters with dots for first match

lookbehind != repeated
drops characters for consecutive repetitions

lookbehind = repeated

punctuation - 36
repeated --- 36-36-36

{
  "input": "---",
  "output": "---"
},
{
  "input": "------",
  "output": "---"
},
{
 "input": "-------",
  "output": "----"
}

lou_translateString.c::1963

`repword`

matches characters when the word before it equals the word after it
replaces matched characters with dots and drops word after it

include chardefs.cti
repword - 1356

{
  "input": "foo-foo",
  "output": "fooz"
},
{
  "input": "foo-foo-foo",
  "output": "fooz"
}

`sufword`

matches characters when they are either a word or at the beginning of a word (space, punctuation, letter)

lookbehind = ( space | puntuation ) && lookahead = ( space | punctuation | letter )
replaces matched characters with dots

lou_translateString.c::1423

`syllable`

matches characters
replaces matched characters with dots
inhibits other contractions across boundaries either from left or right

include chardefs.cti
syllable bar =
always foob 124-136-12
always foobar 124-12

{
  "input": "fooba",
  "output": "fuba"
},
{
  "input": "foobar",
  "output": "foobar"
}

lou_translateString.c::1716

`word`

matches characters when they are a word (space, punctuation)

lookbehind = ( space | puntuation ) && lookahead = ( space | punctuation )
replaces matched characters with dots

include chardefs.cti
word foo 124-136

{
  "input": "foo",
  "output": "fu"
},
{
  "input": "foo.",
  "output": "fu."
},
{
  "input": "foobar",
  "output": "foobar"
}

lou_translateString.c::1370

`decpoint`

matches when character preceeds a digit (digit)

lookahead = digit
replaces matched characters with dots

digit 0 245
punctuation . 46
punctuation ; 56
sign # 3456
numsign 3456
decpoint . 46

{
  "input": ".0",
  "output": "#.0"
},
{
  "input": "0.0",
  "output": "#0.0"
},
{
  "input": ";0",
  "output": ";#0"
}

lou_translateString.c::1492

`context`

matches characters expressed by test
replaces matched characters with dot patterns expressed by action

`pass2`

considered in the second pass only
matches dot patterns expressed by test
replaces matched characters with dot patterns expressed by action

`pass3`

considered in the third pass only
matches dot patterns expressed by test
replaces matched characters with dot patterns expressed by action

`pass4`

considered in the fourth pass only
matches dot patterns expressed by test
replaces matched characters with dot patterns expressed by action

`correct`

considered in the corrections phase only
matches characters expressed by test
replaces matched characters with characters expressed by action

TODO `display`

TODO `hyphen`

TODO `undefined`

lou_translateString.c::1913

TODO `class`

TODO `before`

TODO `after`

TODO `grouping`

TODO `swapcd`

TODO `swapdd`

TODO `swapcc`

TODO `exactdots`

TODO `nofor`

TODO `noback`

Appendix I: Translation algorithm in pseudo-code

(def table (compile-table))

(defn translate [input typeform]
  ""
  {:pre [(= (count input) (count typeform))]}
  (let [;; pass 0
        tmp (make-corrections input)
        ;; pass 1
        syl-info (mark-syllables tmp1)
        cap-info (mark-capitals tmp1)
        tmp (loop [pos 0
                   tmp2 []
                   prev-rules []
                   emph-info []]
              (if (>= pos (count tmp))
                tmp2
                (if-let [[pos tmp2] (maybe-translate-comp-braille tmp typeform pos tmp2)]
                  (recur pos tmp2 prev-rules emph-info)
                  (let [[tmp2 emph-info] (maybe-insert-emph-indicator tmp typeform emph-info pos tmp2)
                        rule (select-rule tmp syl-info cap-info pos prev-rules)]
                    (if (= (rule-type rule) :compbrl)
                      (let [[pos tmp2] (do-compbrl tmp pos tmp2)]
                        (recur pos tmp2 prev-rules emph-info))
                      (let [tmp2 (or (maybe-insert-num-indicator tmp pos rule prev-rules tmp2)
                                     (maybe-insert-let-indicator tmp pos rule tmp2)
                                     (maybe-insert-cap-indicator tmp pos tmp2)
                                     tmp2)
                            tmp2 (if (= (rule-type rule) :largesign)
                                   (trim-trailing-space tmp2)
                                   tmp2)]
                        (let [[pos tmp2] (do-translation tmp pos rule tmp2)
                              prev-rules (conj prev-rules rule)]
                          (recur pos tmp2 prev-rules emph-info))))))))
        ;; pass 2 to 4
        tmp (loop [n 2
                   tmp tmp]
              (if (> n 4)
                tmp
                (recur (+1 n)
                       (loop [pos 0
                              tmp2 []]
                         (if (>= pos (count tmp))
                           tmp2
                           (if-let [rule (select-multipass-rule tmp pos n)]
                             (let [[pos tmp2] (do-translation tmp pos rule tmp2)]
                               (recur pos tmp2))
                             (let [tmp2 (conj tmp2 (tmp pos))
                                   pos (+1 pos)]
                               (recur pos tmp2))))))))]
    tmp))

(defn make-corrections [input]
  "(See opcode correct.)")

(defn mark-syllables [input]
  "(See opcode syllable.)")

(defn mark-capitals [input]
  "(See opcode uplow.)")

(defn maybe-translate-comp-braille [input typeform pos output]
  "Maybe translate part of the `input' starting at position `pos' as
computer braille, based on the `typeform' parameter. (See opcodes
begcomp and endcomp.)"
  {:pre [(= (count input) (count typeform))
         (< pos (count input))]
   :post [(if-let [[new-pos new-output] %]
            (and (> new-pos pos)
                 (only-appended? output new-output)
                 (> (count new-output) (count output)))
            true)]})

(defn maybe-insert-emph-indicator [input typeform emph-info pos output]
  "Maybe insert an emphasis indicator based on the `typeform'
parameter at position `pos' and info `emph-info' from previous
iterations. Maybe mark positions in the input for future insertion of
emphasis indicators. (See opcodes singleletter__, firstletter__,
lastletter__, firstword__, lastword__before, lastword__after and
len__phrase.)"
  {:pre [(= (count input) (count typeform))
         (< pos (count input))]
   :post [(let [[new-output new-emph-info] %]
            (and (only-appended? output new-output)
                 (only-appended? emph-info new-emph-info)))]})

(defn select-rule [input syl-info cap-info pos prev-rules]
  "Select a translation rule that matches the `input' at position
  `pos', based on info about syllable and capital positions, and a
  list of previously applied rules.")

(defn do-compbrl [input pos output]
  "")

(defn maybe-insert-num-indicator [input pos rule prev-rules output]
  ""
  {:post [(if-let [new-output %]
            (and (only-appended? output new-output)
                 (> (count new-output) (count output)))
            true)]})

(defn maybe-insert-let-indicator [input pos rule output]
  ""
  {:post [(if-let [new-output %]
            (and (only-appended? output new-output)
                 (> (count new-output) (count output)))
            true)]})

(defn maybe-insert-cap-indicator [input pos output]
  ""
  {:post [(if-let [new-output %]
            (and (only-appended? output new-output)
                 (> (count new-output) (count output)))
            true)]})

(defn trim-trailing-space [output]
  "Trim trailing space characters from `output'."
  {:post [(= % (take (count %) output))]})

(defn do-translation [input pos rule output]
  "Apply the translation rule on `input' at position `pos'."
  {:post [(let [[new-pos new-output] %]
            (and (> new-pos pos)
                 (only-appended? output new-output)))]})

(defn select-multipass-rule [input pos n]
  "Select a multipass translation rule for pass `n' that matches the
  `input' at position `pos'. (See opcodes pass2, pass3 and pass4)."
  {:pre [(#{2 3 4} n)]
   :post [(if-let [rule %]
            (= (rule-type %) (case n
                               2 :pass2
                               3 :pass3
                               4 :pass4))
            true)]})

(defn only-appended? [old new]
  (= old (take (count old) new)))

Liblouis Table Specification

Opcodes

TODO compbrl

TODO comp6

TODO nocont

TODO replace

TODO display

TODO hyphen

TODO undefined

TODO class

TODO before

TODO after

TODO grouping

TODO swapcd

TODO swapdd

TODO swapcc

TODO exactdots

TODO nofor

TODO noback

Appendix I: Translation algorithm in pseudo-code

TODO `compbrl`

TODO `comp6`

TODO `nocont`

TODO `replace`

TODO `display`

TODO `hyphen`

TODO `undefined`

TODO `class`

TODO `before`

TODO `after`

TODO `grouping`

TODO `swapcd`

TODO `swapdd`

TODO `swapcc`

TODO `exactdots`

TODO `nofor`

TODO `noback`