A parser for LoongArch instruction encoding table

This script turns the AsciiDoc source of “Appendix A: Table of Instruction Encoding” in the LoongArch Reference Manual, Volume 1: Basic Architecture (a rendered version can be found here) into machine-readable CSV lines containing mnemonic name, args, and bit patterns. It is also available as a GitHub Gist.

#!/usr/bin/env python3

# MIT License
#
# Copyright (c) 2025 Rong "Mantle" Bao <[email protected]>.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import csv
import enum
import io
import re
import sys

RE_LIT_BIT = re.compile(r"^\|(0|1)$")
RE_WILD_BITS = re.compile(r"^(\d+)\+\|.*$")


class State(enum.IntEnum):
    MNEMONIC = enum.auto()
    OPERAND = enum.auto()
    BITS = enum.auto()


state = State.MNEMONIC
mnemonic = ""
args = ""
bit_pat = ""
rem_bits = 32
insns = []
for l in filter(lambda l: len(l) > 0, map(lambda l: l.strip(), sys.stdin.readlines())):
    match state:
        case State.MNEMONIC:
            rem_bits = 32
            bit_pat = ""
            mnemonic = l.lstrip("| ")
            state = State.OPERAND
        case State.OPERAND:
            args = l.lstrip("| ")
            state = State.BITS
        case State.BITS:
            if m := RE_LIT_BIT.match(l):
                bit_pat += m.group(1)
                rem_bits -= 1
            elif m := RE_WILD_BITS.match(l):
                bit_pat += "?" * int(m.group(1))
                rem_bits -= int(m.group(1))
            if rem_bits == 0:
                insns.append((mnemonic, args, bit_pat))
            state = State.MNEMONIC if rem_bits == 0 else State.BITS

with io.TextIOWrapper(sys.stdout.buffer, newline="") as stdout:
    writer = csv.writer(stdout)
    writer.writerow(["mnemonic", "args", "bit_pat"])
    writer.writerows(insns)

To use this script, download a copy of Appendix A from the link given above. Once downloaded, strip all headings, table headers, and footers:

|CLO.W
|rd, rj
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|1
|0
|0
5+|rj
5+|rd

|CLZ.W
|rd, rj
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|1
|0
|1
5+|rj
5+|rd

[...]

Then pipe it into the script. The CSV format will pipe out.

mnemonic,args,bit_pat
CLO.W,"rd, rj",0000000000000000000100??????????
CLZ.W,"rd, rj",0000000000000000000101??????????
[...]

Update. The Gist version is updated to produce argument placement as well. The new version will produce output like this:

mnemonic,args,bit_pat,args_pos
CLO.W,"rd, rj",0000000000000000000100??????????,rj@[9:5];rd@[4:0]
CLZ.W,"rd, rj",0000000000000000000101??????????,rj@[9:5];rd@[4:0]
[...]
BLTU,"rj, rd, offs",011010??????????????????????????,offs[15:0]@[25:10];rj@[9:5];rd@[4:0]
BGEU,"rj, rd, offs",011011??????????????????????????,offs[15:0]@[25:10];rj@[9:5];rd@[4:0]