Copyright (C) 2004-2009 by Anton Treuenfels
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
HXA is a macro cross assembler created simply for fun as a hobby project (hence the name). HXA reflects many of my own ideas about what is and is not important in assembler design.
Perhaps the most central idea from a programmer's viewpoint is that HXA requires almost every programmer intention to be made explicit in assembly source code. I've had the experience of trying to read source code in unfamiliar languages for assemblers whose manuals I did not have. That's not so easy, so HXA tries to make sure both writer and reader have no doubt about what is supposed to happen.
An area of ongoing concern is error detection and reporting. HXA has been designed from the start to detect many possible source code errors. If found, errors are reported in a consistent manner that attempts to make clear what, why and where something went wrong. The aim is to give a progammer trying to fix an error a "fighting chance".
HXA makes an effort to accomodate programmers used to the conventions of other assemblers. It is fairly agnostic with regard to how numeric literals are specified and what the exact names of pseudo ops are.
A couple of things I don't worry much about are size and speed. Current machines are very fast and have very large memories, so concerns of this nature in an assembler aren't too pressing. While trying to never needlessly inflate HXA, when push comes to shove I permit it to get larger and slower. That said, the MS-DOS version of HXA typically assembles source code at 400-500 lines per second on an old Pentium 166.
I hope you find using HXA to be both productive and enjoyable. If you have any questions or comments, please let me know. There are many excellent ideas still unstolen, and these will help decide which ones are plundered next :)
- Anton Treuenfels
Snail-mail:
E-mail:
HXA is designed to be fairly portable between various processors. This document describes only those portions of HXA which are processor-independent.
As assemblers in general are largely concerned with textual manipulation of assembly source code, this is actually most of HXA.
In the present version of HXA the only portions which "understand" the processor instruction opcodes of any particular assembly language are isolated in a single one of its own source files. Replacement of this file is one method of producing variants of HXA capable of handling different assembly languages. Each existing such variant is discussed in a separate document.
HXA
HXA Variants
Labels
Expressions
Pseudo Opcodes
Extended
Changed
HXA65 Version
no changes
A complete list of all changes in v0.163 can be found in the implementation documentation.
The short "Hello, World!" program is the first example on the first page of "The C Programming Language" by Kernighan and Ritchie. First published in 1978, it has since become a very popular example for introducing almost any programming language.
While HXA is not a programming language per se , the assembly languages it supports are. The "Hello, World!" Demos provide short examples of how HXA may be used to create real programs for real computers.
All variants of HXA accept a single filename as a command-line argument:
HXAxx
filename
HXAxx is the specific variant of HXA invoked, where " xx " is replaced by a variant identifier.
Filename is assumed to be a text file containing assembly language source code.
If multiple source files are to be assembled together, they must be specified within the root filename by using the "INCLUDE" pseudo opcode.
Filename may contain device and/or directory specifiers (ie., a path ). If filename contains a path, HXA automatically prefixes every other input and output file named within filename with that same path. The only exception is if a named file has its own path which contains a device identifier or starts from the root directory, in which case HXA uses that path instead.
The effect is that files named within filename can normally be specified by their location relative to it, but if desired an absolute location can be specified instead.
There are no command-line options or flags. Changes to HXA's default behavior are made explicit by pseudo opcodes used within the source file(s).
By default HXA produces no output other than status and error messages. These are sent to stdout , which is normally the screen (console). If no screen output is desired, stdout may be re-directed on the command line to a file (to capture all such output) or the null (NUL) device (to ignore all such output).
The "--FILE" pseudo ops direct HXA to produce error, object, Intel hex and listing files.
At exit HXA returns a value of zero if no warnings or errors were encountered during assembly. An exit value of one to seven is bit-mapped to report the type of problem(s) encountered:
| Bit Value | Detected |
| $01 | Warning(s) |
| $02 | Error(s) |
| $04 | Fatal Error |
A value of two or higher implies HXA halted assembly and produced only an error file (if one was specified).
HXA accepts standard ASCII text files as source code. Each line of a text file is treated as a separate source code line.
A source code line may consist of up to four logical fields:
LABEL OPCODE EXPRESSION COMMENT
These fields are separated from each other by one or more whitespace characters. Each field is optional, but if present must appear in the order shown. The only restriction is that expressions must follow an opcode; they cannot appear alone or directly following a label.
Malformed source lines are an error.
Note that the actual column position of any field within a source code line is not important to HXA. What a field represents is more important than where it is.
Examples
HXA is case-insensitive with respect to labels and opcodes:
| User Label | Pseudo Opcode |
| MYLABEL | BYTE |
| mylabel | byte |
| MyLabel | Byte |
Note that the opcode field may contain processor instruction opcodes, assembler pseudo opcodes or macro names . As used here, opcode refers to all these types.
The expression field is separated from its preceeding opcode by whitespace, and continues until the end of the source line or the start of the comment field, whichever comes first. Whitespace may be used in the expression field as desired.
HXA normally divides the expression field into sub-fields before further processing. Each comma (',') in the expression field marks the start of a separate argument to the opcode it follows. Leading and trailing whitespace is discarded from each sub-field. Blank or empty sub-fields are not allowed.
There are exceptions where a comma does not divide:
The escape mechanism is the most general method of preventing the expression field from being divided. Commas in character and string literals and those separating function arguments actually can be escaped, if desired (although there is no practical advantage in doing so).
Source lines which have an asterisk ('*') in the first column, or whose first non-whitespace character is a semi-colon (';'), are considered comment lines, and are ignored.
A comment following a label, opcode or expression is indicated by a semi-colon immediately preceeded by at least one whitespace character. Note that if this sequence appears in a string literal , the semi-colon must be escaped in order to be handled properly.
Examples
In assembly languages labels are symbolic names used to represent values. The symbolic names can then be used in expressions in place of those values.
Most HXA label names begin with an alphabetic ('A-Z') or underscore ('_') character followed by any number of alphanumeric ('A-Z0-9'), underscore or period ('.') characters, except:
| myLabel | _mylabel | my.label |
| target1 | target_2 | t.123 |
HXA allows labels to represent numeric and string values. The type of a label is made manifest by the presence or absence of a dollar sign suffix ('$'): a label without this suffix represents a numeric value, a label with it a string.
| Numeric | String |
| myNumLabel | myStrLabel$ |
| temp1 | t2$ |
A final colon (':') suffix may be used on any of these label names. This is generally ignored by HXA, so ' MyLabel: ' and ' MyLabel ' both represent the same numeric value wherever they appear. However when used on the first field of a source code line, a colon suffix will force HXA to recognize the field as a label rather than an opcode (in case the same name is used for both).
HXA extends its manifest typing system to distinguish several sub-types of numeric and string labels. The sub-type of a label is distinguished by the first character of its name.
| Sub-Type | Indicator | Initial Character | Ex: Numeric | Ex: String |
| Global | Alphabetic or underscore | 'A-Z' or '_' | myGlobal | aGlobal$ |
| Local | "At" sign | '@' | @myLocal | @aLocal$ |
| Variable | Right bracket | ']' | ]myVar | ]aVar$ |
| Forward Branch Target | Plus sign | '+' | + | n/a |
| Backward Branch Target | Minus sign | '-' | - | n/a |
Notes:
Labels appearing in the label field of a source code line are assigned values. Most often the value is the current value of the program counter. HXA makes this assignment automatically; in most cases a programmer doesn't know (or need to know) what the actual value of such a label is.
For numeric labels this value is an integer which can be used directly as a memory address. For string labels this value takes the form of a decimal string representation of the same memory address.
Both types of labels can also be assigned explicit values with the "EQU" pseudo opcode. For string labels this is the most common method.
Examples
Global labels, as their name implies, have global scope. The value represented by a global label can be accessed from anywhere in the source code file(s).
Global labels are fixed. Once assigned a value, that value cannot be changed. However, the same value may be assigned to the same global label any number of times.
Global label names must be unique. No two global labels can have the same name.
A global label in the label field of a source code line causes the current local scope to terminate. All current local label names and values are forgotten, and a new local scope begins.
In general global label names which match processor instruction opcodes, assembler pseudo opcodes or previously defined macro names should be avoided. Global labels in the first field of source code line which match any of these will not be recognized correctly.
Examples
Local labels, as their name implies, have local scope. The value represented by a particular local label cannot be accessed outside the scope it is created in.
Local labels are fixed. Once assigned a value, that value cannot be changed. However, the same value may be assigned to the same local label in the same local scope any number of times.
Local label names within the same local scope must be unique. Local labels in different local scopes may have the same name.
The first local scope is the same as the first source file. This file-level local scope is subdivided by each global label in it. That is, every time a global label is encountered in the label field, the current local scope is ended and a new one begun.
Thus, the value of a particular local label can be accessed only between the two global labels it is surrounded by (or, at the extremes, between a global label and the start or end of the file). If a source file has no global labels, any local label in it must be unique.
HXA automatically creates nested local scopes during file inclusion , expansion of macro , repeat and while blocks, and while in segment fragments . These scopes follow the same rules as the original file-level local scope. That is, they can be subdivided by global labels and any local labels in a subdivision must be unique.
Examples
Variable labels have global scope. The current value represented by one of these labels can be accessed from anywhere in the source code file(s).
Variable labels, as their name implies, are not fixed. The value represented by one of these labels can be changed at any time by using it in the label field of any source code line which allows labels.
Variable label names are by default unique. Any use of a particular variable label name is considered to refer to the same variable label.
References to variable labels normally represent the value they were most recently assigned. Thus variable labels in expression should usually be used only for backward references. That is, they should appear in a label field before they are used in an expression field, so that they have a known value.
HXA does permit forward reference to variable labels, but this useage is obscure and difficult to use successfully.
Examples
Branch target labels (also called anonymous labels) have global scope. The value represented by one of these labels can be accessed from anywhere in the source code file(s).
Branch target labels are fixed. Once assigned, their values cannot be changed.
Branch target label names are not unique. There are only two names:
| Type | Indicator | Character |
| Forward Branch Target | Plus sign | '+' |
| Backward Branch Target | Minus sign | '-' |
Branch target labels, as their name implies, are meant to serve as the destination points of program control branches. Branch target labels relieve the programmer of the burden of creating a unique label name for what is frequently a one-time-only use.
To mark a source code line as a branch target, use a branch target name in the label field. The two names can be combined in either order in the label field, which marks a location as both a forward and a backward branch target.
HXA permits branch target labels to be used with any pseudo opcodes which accept labels, but this useage is unusual and generates a warning. If a branch target label really is meant to refer to the location of a pseudo opcode, the warning can be avoided by placing the label on a line by itself and the pseudo opcode on the next following line.
Examples
HXA supports both numeric and string expressions. Legal expressions consist of at least one operand and zero or more operators . Operands and operators may be separated by spaces for clarity.
HXA performs numeric expression evaluation with a final result in the range of a 32-bit signed integer.
| - | Binary | Decimal | Hexadecimal |
| Minimum | %100000000000000000000000000000000 | -2147483648 | $80000000 |
| Maximum | %011111111111111111111111111111111 | 2147483647 | $7FFFFFFF |
The underlying run-time package used by HXA permits precise intermediate results larger than 32-bit signed integers can describe. Note this means intermediate overflow does not result in "wrapping" of values from positive to negative or vice-versa. That is, arithmetic is not modular. This facility should be used carefully or not at all.
String expressions yield string results (note that many operators on string operands have numeric results). The only string operation is concatenation , which joins two strings together to form a longer one. HXA has no explicit concatenation operator; concatenation is implied by placing two string operands adjacent to each other.
The results of string expressions may be up to at least 8000 characters in length. The underlying run-time package used by HXA may permit longer strings, but this is not guaranteed in all versions of HXA.
There are three basic kinds of operands: literals, symbols and functions . Each operand kind comes in two or more types.
HXA recognizes binary, decimal and hexadecimal numeric literals in either Motorola or Intel formats. Hexadecimal literals may also be expressed in the C language format. For hexadecimals, the numbers 10 to 15 are represented by the characters 'A-F' or 'a-f' (case doesn't matter).
In general, the base chosen to express a numeric literal should make its intended use as clear as possible. Leading zeros are always ignored, so they may be used freely to help clarify the use intended.
Binary and hexadecimal numeric literals which represent integers larger than 32 bits in size are not allowed.
Decimal numeric literals may be up to 10 digits in length. This does allow them to exceed the maximum and minumum allowed values during intermediate calculations. Hoever this will not cause an error as long as the final result of the expression they appear in is within the allowed range.
| - | Binary | Decimal | Hexadecimal |
| Motorola | %11000000 | 192 | $C0 |
| Intel | 11000000B | 192D | 0C0H |
| C | - | - | 0xC0 |
Intel-format number notes:
Integer numeric literals may be used wherever a numeric value is expected.
Examples
A character literal is a single character code delimited by single quote marks ('). A single quote mark may itself be included in a character literal by escaping it.
A character code may be specified by either its printable ASCII representation or an escape sequence . The numeric value of a printable character is normally its position in the ASCII collating sequence. If the value of a character code according to the current character set translation is desired instead, it can be obtained using the XLATE() function.
Character literals may be used wherever a numeric value is expected.
Examples
A string literal is a sequence of zero or more character codes delimited by double quote marks ("). A double quote mark may itself be included in a literal string by escaping it.
Each character code may be specified by either its printable ASCII representation or an escape sequence . The numeric value of each printable character is normally its position in the ASCII collating sequence.
String literals may be used wherever a string value is expected. Note that when used with string comparison and pattern matching operators, string literals are not affected by the current character set translation .
Examples
A regular expression is a sequence of one or more character codes delimited by forward slash (/) characters. A forward slash may itself be included in a regular expression by escaping it.
Only pattern match operators apply to regular expressions. Any escape sequences in a regular expression are evaluated before a pattern match is attempted.
Regular expression patterns provide a compact and flexible way to specify multiple or exact matches to a string. They are provided by HXA mainly as an aid to macro creation.
A full tutorial in the use of regular expression patterns is beyond the scope of this document. Briefly, a regular expression pattern consists of normal characters, which match those characters, and meta-characters , which match character types, groups or positions. Most of the power of regular expressions is provided by meta-characters singly or in combination with others.
For a fuller explanation, consult any AWK/NAWK/TAWK reference.
| To match | Notation | "abc" Matches | "abc" does NOT Match |
| Any single character | '.' | /a.c/ | /xy./ |
| Any character in a set | [(set)] | /[cde]/ | /[xyz]/ |
| Any character not in a set | [^(set)] | /[^ghe]/ | /[^abc]/ |
| Zero or more occurances of regular expression | '*' | /ax*/ | /axy*/ |
| One or more occurances of regular expression | '+' | /ab+/ | /ax+/ |
| Zero or one occurance of regular expression | '?' | /abcd?/ | /abcd?e/ |
| At start of string | '^' | /^abc/ | /^abcd/ |
| At end of string | '$' | /abc$/ | /ebc$/ |
| Either of two regular expressions | '|' | /abc|def/ | /def|wxyz/ |
Notes
Regular expression literals may be used wherever a regular expression is expected.
Examples
Escape sequences can be used in character constants, string literals and regular expression patterns to represent character codes that are difficult to specify otherwise (eg., control codes). There are two types of escape sequences, mnemonic and hexadecimal .
Mnemonic Escape Sequences
Mnemonic escapes take the form of a backslash ('\') followed by a printable character.
| Character | Name | Sequence |
| BS | backspace | \b |
| FF | formfeed | \f |
| NL | newline | \n |
| CR | carriage return | \r |
| space | space | \s |
| HT | tab | \t |
| VT | vertical tab | \v |
Any other characters following a backslash simply become themselves. This can be used to "turn off" the normal interpretation of certain characters.
| Sequence | Indicates | Commonly Used |
| \, | a comma | in the expression field |
| \" | a double quote mark | within literal strings |
| \\ | a single backslash | within literal strings |
Hexadecimal Escape Sequences
Hexadecimal escapes take the form of a backslash (\) followed by a hexadecimal number in the range zero to 255 decimal in Motorola ($00-$FF), Intel (00H-0FFH) or C (0x00-0xFF) formats. The hexadecimal number must be exactly two hex digits unless a leading zero is necessary for Intel-format numbers, in which case three digits are allowed. Alphabetic case is ignored.
| Decimal | Motorola Escape | Intel Escape | C Escape |
| 12 | \$0C | \0CH | \0x0C |
| 128 | \$80 | \80H | \0x80 |
| 254 | \$FE | \0FEH | \0xFE |
Examples
Global , local and variable numeric labels may be used wherever a numeric value is expected. When referenced, the value of the label is used during evaluation.
References to branch target labels in an expression consist of a colon ( : ) character immediately followed by one or more of the two branch target label names.
For example, in an expression ' :+ ' refers to the next forward branch target label from the current position, ' :++ ' to the second, ' :+++ ' to the third, and so on. Any number of labels of the same name can be concatenated together to indicate which branch target label is actually referred to.
Branch target references in this form can be used with any numeric operators in an expression.
If the branch target reference is the only term in an expression (ie., it appears alone without any other operators or operands), then the colon prefix is optional and may be omitted.
Examples
Global , local and variable string labels may be used wherever a string value is expected. When referenced, the value of the label is used during evaluation.
Examples
The current value of the program counter is represented in expressions by either an asterisk ('*') or a dollar sign ('$').
The minimum program counter value is zero. The maximum is processor dependent, except that HXA currently limits the maximum possible value to $7FFFFFFF (or 31 bits, the same as the maximum positive 32-bit signed integer).
The program counter value may be used wherever a numeric value is expected.
Examples
A symbol is forward referenced when it is used as an operand in an expression but its actual value is not known.
In a monolithic program, only labels have this property. It is assumed that sooner or later the label will appear in the label field and thereby acquire a known value.
In a segmented program, neither the labels nor the program counter of any relative origin segment has a known value until after the first pass is complete.
HXA allows forward reference in any construct which ultimately generates code or data of a known size, such as instruction opcodes or "BIT--" pseudo ops. HXA saves any partially evaluated expression and completes evaluation during the second pass. At this point any relative program counters have been made absolute and the value of all labels is known.
HXA forbids forward reference whenever the value of an expression must be known during the first pass. These situations include conditional pseudo ops, because HXA must make a decision when the expression is first encountered. The "STRING--" and "HEX" pseudo ops advance the program counter by an amount that must be known during the first pass, so they also do not allow forward reference.
Useage:
__VER__
__VER__ has the value of the HXA version number.
This value is decimal coded by nybbles (4-bits), or BCD encoding.
Bit Interpretation
Examples:
It is unlikely that 99,999 major versions will ever be released.
Examples
Conventions
Argument Types
Useage:
CHR$(
num_expr
)
CHR$() returns a string of length one whose single character has the value num_expr & $FF . The bitwise-AND ensures that the value will fit in a single 8-bit character.
CHR$() is not affected by character set translation .
CHR$() is the inverse of ORD() .
Examples
Useage:
DEFINED(
name$
)
DEFINED() returns logical TRUE if its argument is a currently defined macro , otherwise logical FALSE.
A non-fatal error occurs if name$ is a string expression which does not evaluate to the form of a global name. The expression containing DEFINED() is treated as incompletely evaluated.
Examples
Useage:
FORWARD(
str_expr$
)
FORWARD() returns logical TRUE if str_expr$ , converted to a numeric expression, contains one or more forward references , otherwise logical FALSE.
A non-fatal error occurs if str_expr$ cannot be converted to a legal numeric expression. The expression containing FORWARD() is treated as incompletely evaluated.
Examples
Useage:
INDEX(
str_expr1$, str_expr2$ [, num_expr]
)
INDEX() returns the leftmost starting index of str_expr2$ in str_expr1$ .
If str_expr2$ does not occur in str_expr1$ , the result is zero.
If either string argument is the null string, the result is zero.
If num_expr is not present, the search begins at index one (the first, or leftmost, character of str_expr1$ ).
If num_expr is a positive value between one and LEN( str_expr1$ ) inclusive, the search begins at index num_expr counted from the start of the string.
If num_expr is a negative value between one and LEN( str_expr1$ ), inclusive, the search begins at index num_expr counted from the end of the string. More specifically, at index LEN(str_expr1$) + num_expr + 1 .
For any other value of num_expr the result is zero.
Examples
Useage:
INDEXR(
str_expr1$, str_expr2$ [, num_expr]
)
INDEXR() function returns the rightmost starting index of str_expr2$ in str_expr1$ . If str_expr2$ does not occur in str_expr1$ , the result is zero.
If either string argument is the null string, the result is zero.
If num_expr is not present, the search begins at index LEN( str_expr1$ ) (the last, or rightmost, character of str_expr1$ ).
If num_expr is a positive value between one and LEN( str_expr1$ ) inclusive, the search begins at index num_expr counted from the start of the string.
If num_expr is a negative value between one and LEN( str_expr1$ ), inclusive, the search begins at index num_expr counted from the end of the string. More specifically, at index LEN(str_expr1$) + num_expr + 1 .
For any other value of num_expr the result is zero.
Examples
Useage:
LEN(
str_expr$
)
LEN() returns the number of 8-bit character codes in str_expr$ .
Examples
Useage:
LOWER$(
str_expr$
)
LOWER$() returns a string the same as str_expr$ , except that any upper case alphabetic characters have been converted to lower case.
Examples
Useage:
MATCH$(
str_expr$, regex [, num_expr]
)
MATCH$() returns the leftmost substring of str_expr$ that matches the regular expression pattern regex . If regex does not match str_exrpr , the result is the null string.
If num_expr is not present, the search begins at index one (the first, or leftmost, character of str_expr$ ).
If num_expr is a positive value between one and LEN( str_expr$ ) inclusive, the search begins at index num_expr counted from the start of the string.
If num_expr is a negative value between one and LEN( str_expr$ ), inclusive, the search begins at index num_expr counted from the end of the string. More specifically, at index LEN(str_expr$) + num_expr + 1 .
For any other value of num_expr the result is the null string.
Examples
Useage:
MID$(
str_expr$, num_expr1 [, num_expr2]
)
MID$() returns a substring of str_expr$ .
If str_expr$ is the null string, the result is the null string.
If num_expr1 is a positive value between one and LEN( str_expr$ ) inclusive, the substring begins at index num_expr1 counted from the start of the string.
If num_expr1 is a negative value between one and LEN( str_expr$ ), inclusive, the substring begins at index num_expr1 counted from the end of the string. More specifically, at index LEN(str_expr) + num_expr1 + 1 .
For any other value of num_expr1 the result is the null string.
If num_expr2 is not present or has a value greater than the number of characters in str_expr$ from the start position to its end, the returned substring consists of all characters of str_expr$ from the start position onwards.
If num_expr2 is present and has a value between one and the number of characters in str_expr$ from the start position to the end inclusive, the returned substring consists of num_expr2 characters of str_expr$ beginning at the start position.
If num_expr2 is less than one the result is the null string.
Examples
Useage:
ORD(
str_expr$ [, num_expr]
)
ORD() returns the 8-bit numeric value of a character in str_expr$ . If str_expr$ is the null string, ORD() returns zero.
ORD() is not affected by character set translation .
ORD() is the inverse of CHR$() .
If num_expr is not present, the first character of str_expr$ is used.
If num_expr is a positive value between one and LEN( str_expr$ ) inclusive, the character used is at index num_expr counted from the start of the string.
If num_expr is a negative value between one and LEN( str_expr$ ), inclusive, the character used is at index num_expr counted from the end of the string. More specifically, at index LEN(str_expr$) + num_expr + 1 .
For any other value of num_expr the result is zero.
Examples
Useage:
SEG--(
name$
)
Variants
SEGBEG() returns the absolute start address of segment name$ . That is, the address that contains the first byte used by name$ .
SEGEND() returns the absolute end address plus one of segment name$ . That is, the address following the last byte used by name$ . Note SEGEND() equals SEGBEG() if nothing has ever caused the segment program counter to increment.
SEGLEN() returns the length of segment name$ . That is, the value of SEGEND() - SEGBEG().
SEGOFF() returns the byte offset of segment name$ from the first segment. That is, the sum of the lengths of all non-common segments before segment name$ in the output sequence. It is the same as the zero-based byte offset of segment name$ from the start of a raw binary output file.
It is an error to use SEGOFF() on common segments. They do not appear in output files and thus have no meaningful offset value.
name$ may be a forward reference to a segment which has not yet been encountered.
A non-fatal error occurs if name$ is a string expression which does not evaluate to the form of a global name. The expression containing SEG--() is treated as incompletely evaluated.
Examples
Useage:
STR$(
num_expr
)
STR$() converts num_expr to a decimal string.
Num_expr must be in the range of a 32-bit signed integer. If it is not an error is reported and the null string is returned.
Examples
Useage:
UPPER$(
str_expr$
)
UPPER$() returns a string the same as str_expr$ , except that any lower case alphabetic characters have been converted to upper case.
Examples
Useage:
VAL(
str_expr$
)
VAL() converts str_expr$ to a numeric expression and evaluates it. The converted expression may use forward reference in any context it is normally allowed.
If str_expr$ cannot be converted to a legal numeric expression a non-fatal error is reported and the expression containing VAL() is treated as incompletely evaluated.
Examples
Useage:
VER()
VER() has no arguments and returns the HXA version number.
Examples
Useage:
XLATE(
num_expr
)
XLATE() returns the numeric value at index num_expr & $FF in the current character set translation table. The bitwise-AND ensures the index is always within the range zero to 255, the size of the translation table.
Examples
HXA provides operators which work with numeric, string and regular expression operands. Results can be numeric or string (logical operator results are either zero (FALSE) or one (TRUE)).
Each operator has an associated precedence which determines the order in which operators are applied to operands. Parentheses may be used to group subexpressions in order to change the normal precedence. In the standard algebraic manner, groups of operands and operators inside pairs of parentheses will be evaluated before those outside, and sets of parentheses nest.
Although some assembly languages use pairs of parentheses to indicate address indirection, this usually does not cause HXA any difficulty. When used as address indirection indicators, parentheses normally surround the entire expression of interest. The value of an entire expression surrounded by parentheses is no different than the value of the same expression not surrounded by parentheses, so no special treatment is required by HXA.
Operators are arranged in order of precedence, from highest to lowest. Those of higher precedence are evaluated before those of lower precedence.
Operators grouped together have equal precedence.
| Operator | Notation | Result Type |
| --- | --- | --- |
| Built-in Functions | F() | Any |
| --- | --- | --- |
| String Concatenation | S1 S2 | String |
| --- | --- | --- |
| String Equal | S1 == S2 | Logical |
| String NOT Equal | S1 != S2 | |
| String Less Than | S1 < S2 | |
| String Greater Than | S1 > S2 | |
| String Less Than or Equal | S1 <= S2 | |
| String Greater Than or Equal | S1 >= S2 | |
| --- | --- | --- |
| String Matches Regular Expression | S ~ RE | Logical |
| String NOT Match Regular Expression | S !~ RE | |
| --- | --- | --- |
| Unary Plus | +N | Numeric |
| Unary Minus | -N | |
| Logical NOT | !N | Logical |
| Bitwise NOT | ~N | Numeric |
| --- | --- | --- |
| Multiply | N1 * N2 | Numeric |
| Divide | N1 / N2 | |
| Remainder (aka Modulus) | N1 % N2 | |
| --- | --- | --- |
| Add | N1 + N2 | Numeric |
| Subtract | N1 - N2 | |
| --- | --- | --- |
| Left Shift | N1 << N2 | Numeric |
| Right Shift | N1 >> N2 | |
| --- | --- | --- |
| Less Than | N1 < N2 | Logical |
| Greater Than | N1 > N2 | |
| Less Than or Equal | N1 <= N2 | |
| Greater Than or Equal | N1 >= N2 | |
| --- | --- | --- |
| Equal | N1 == N2 | Logical |
| NOT Equal | N1 != N2 | |
| --- | --- | --- |
| Bitwise AND | N1 & N2 | Numeric |
| --- | --- | --- |
| Bitwise Exclusive OR | N1 ^ N2 | Numeric |
| --- | --- | --- |
| Bitwise Inclusive OR | N1 | N2 | Numeric |
| --- | --- | --- |
| Logical AND | N1 && N2 | Logical |
| --- | --- | --- |
| Logical OR | N1 || N2 | Logical |
| --- | --- | --- |
| Extract Bits 0..7 | <N | Numeric |
| Extract Bits 8..15 | >N | |
| Extract Bits 16..31 | ^N | |
| --- | --- | --- |
| Conditional | cond_expr ? true_expr : false_expr | Varies |
Notes:
Examples
The ternay conditional cond_expr ? true_expr : false_expr accepts multiple expression types in all branches.
Cond_expr may be either numeric (TRUE if non-zero) or string (TRUE if non-null).
True_expr and false_expr must both have the same type. They can be numeric, string, global name or regular expressions, as long as their type is correct in context.
In some cases a true_expr or false_expr or both may be coerced to the expected type. For example, numbers to strings or strings to global names.
In some cases an expression which contains an error might not be detected by HXA. Undetected errors may be of two general kinds, parse and unevaluable .
A parse error occurs when an expression is malformed. In some cases HXA may not detect this because no parse is attempted. For example, in false "IF" conditional branchs HXA does not attempt to parse any expression following a nested "IF" or "ELSEIF" pseudo opcode because no result of any subsequent evaluation can change a false branch to true.
An unevaluable error occurs when an expression parses properly but cannot be fully evaluated (usually because of an unresolvable forward reference ). In some cases HXA may not detect this because a sub-expression is never evaluated. The binary operators "&&" and "||", as well as the ternary operator "?:", can cause part of the expression they appear in to be skipped over during evaluation. HXA does not determine whether or not skipped-over parts can be evaluated.
HXA handles expressions in two phases. Each expression is first converted to an internal form. If that succeeds without error, the internal form is then evaluated.
Conversion to internal form can take a significant amount of time. If any single expression is heavily used, it can pay to convert it once and then cache the internal form. Evaluation then happens on the quickly-fetched cached internal form.
HXA automatically caches expressions which meet two conditions. First, to increase the chance the expression is heavily used, it must occur inside a macro expansion. Second, to increase the chance conversion is actually time-consuming, it must have at least one operator (ie., expressions consisting of a lone operand are not cached).
Users may take advantage of this caching strategy when designing macros.
Substituting actual text macro arguments directly into complex expressions can create multiple variants of the same expression if the arguments differ each time. It may pay instead to use variable or local labels in the place of formal text macro arguments in a macro definition, effectively reducing the number of variant expressions HXA encounters.
Pseudo opcodes (or pseudo ops ) appear in the opcode field of a source code line but are not processor instruction opcodes. Instead they control the assembler itself (and so are sometimes also called assembler directives ). They provide facilities such as data storage, conditional assembly, macro assembly and file inclusion.
HXA's pseudo ops may be optionally prefixed with a period ('.') or sharp ('#') character. These are allowed merely for visual distinctiveness, however. HXA disregards these prefix characters when identifying a pseudo op, so conflicts with user names should be avoided.
There are built-in aliases for many HXA pseudo ops. Programmers may use either the base form or the alias with identical results. The "PSALIAS" pseudo op allows further aliasing if desired.
Set CPU
Set Program Counter
Assign Label Value
Data Storage
Pass Termination
Output Files
Macros and Repeat Blocks
Conditional Assembly Blocks
File Inclusion
Segments
User Messages
Limits
Program Listing
Assembler Customization
Miscellaneous
Conventions
Argument Types
Several HXA pseudo ops accept or require equality (or assignment ) arguments. The general form of these arguments are opt_strs containing a single un-escaped equals sign ('='):
The meaning is that whatever appears on the right side is assigned to whatever appears on the left side. This is similar the normal right-to-left assignment of most high-level languages.
The argument may or may not be a string expression. If it begins with a string literal, label or function it is treated as a const_expr$ . Note the argument is twice subjected to escape processing , once when it is first evaluated and once again when it is split at the equals sign.
If the argument is not a string expression it is accepted "as-is", and is not subject to escape processing.
Equality arguments are similar in many ways to sub-fields of the expression field , except that the separator character is an equals sign instead of a comma. In particular:
A difference:
ASSUME
opt_str
[[, opt_str]..]
"ASSUME" primarily provides a method to control processor-specific options without having to add pseudo ops that would have no application other than to that processor.
What effect, if any, such strings have is completely processor-specific. There is no requirement that any string do anything, or that unrecognized strings trigger an error.
The documention of each HXA variant lists recognized strings and what they do.
"ASSUME" also controls some processor-independent HXA options, mainly substituting a proliferation of "ASSUME" strings for a proliferation of pseudo ops. Strings which control such options are handled separately and never passed to the processor-specific portion of HXA. Currently only hexadecimal output files are affected.
Examples
[label]
BIT--
num_expr [[, num_expr]..]
Variants
The "BIT--" pseudo ops store all or part of num_expr into the object code. Num_expr is a 32-bit signed integer value. "BIT08" stores the least significant 8 bits, "BIT16-" the least significant 16 bits, "BIT24-" the least significant 24 bits, and "BIT32-" the entire 32 bits.
Bits not stored are ignored, so num_expr can have a value larger than the storage space alloted will hold.
By default storage of multi-byte values is in native CPU order, either least significant byte (LSB) first or most significant byte (MSB) first. The "R" suffix reverses the native order of stored bytes, so "BIT32R" used on an LSB first CPU will store the MSB first in the object code.
The "BIT--" pseudo ops require at least one numeric expression argument. Each separate expression is treated a separate 32-bit value to be stored according to the specific "BIT--" rules.
Note that some assemblers allow mixed string and numeric expressions to follow "BYTE" (or equivalent) pseudo ops. HXA does not permit this for "BIT--", but does for "STRING" .
Examples
COMMON
The "COMMON" pseudo op causes every fragment of a particular segment to start (or end) at the same memory address. That is, all the fragments of one segment overlay each other. The size of the segment is the size of its largest fragment.
"COMMON" cannot be used after any data has been stored in a segment, nor do common segments permit data to be stored in them. A segment may not be both common and padded .
The "COMMON" pseudo op is designed to simplify management of "scratch" memory in segmented source code. A single memory block may be re-divided by the "DS" pseudo op as convenient. That is, multiple temporary variables with names convenient for whatever routine is using them can occupy the same space (though of course not at the same time).
Common segments do not occupy any space in any output file. While they have absolute start and end addresses and lengths, they do not have any offset value.
Examples
CPU
opt_str
By default HXA does not recognize any particular microprocessor instruction set. "CPU" enables HXA to recognize the instruction set of a specific microprocessor.
A CPU can specified at any point before the first time the program counter is altered by the source code. In particular, a CPU must be specified before the first "ORG" pseudo op is encountered.
The "CPU" pseudo op can be used any number of times. If any use after the first specifies a different CPU than the first, HXA will issue a warning message and ignore it.
As currently implemented one or more CPU instruction sets are hard-coded into different versions of HXA. The specific name of an HXA executable indicates which CPU or related CPUs it is meant for. For example, HXA_T is meant for the imaginary "T_XX" processors used to test all processor-independent portions of HXA.
Examples
[label]
DS
const_expr
The "DS" pseudo op adds const_expr to the program counter. Nothing is done to the skipped bytes and no object code is generated.
Negative values of const_expr are allowed in monolithic programs and have the effect of decrementing the program counter, but a warning is issued. In segmented programs negative values of const_expr are not allowed.
"DS" is especially convenient for allocating consecutive variables which are left uninitialized at assembly time.
Examples
ECHO
[opt_str]
If opt_str is present, the "ECHO" pseudo op copies it to stdout , the standard output file. If opt_str is not present, a blank line is output to stdout .
If an error file has been specified, the output of "ECHO" is sent there as well.
Examples
Monolithic Source
[label]
END
[num_expr
]
Segmented Source
END
[num_expr]
During its first pass HXA normally processes source code lines up through the last line of the first file it encounters (ie., the root source file named when HXA is invoked). If there are no errors, HXA then begins subsequent processing.
The "END" pseudo op can be used to immediately terminate the processing of any source file. Every following line in that file is ignored.
A label may be used in the default monolithic source organization and is assigned the current value of the program counter. In segmented source organization the program counter is invalid outside any segment, hence no label may be used.
If num_expr follows an "END" in the root file, it is assumed to represent a start address to be included in a hexadecimal output file. Whether or not any such file has actually been specified, num_expr is always evaluated and range-checked.
"END" issues warnings if:
Examples
Variants
The "--END" pseudo ops cause HXA to make segments absolute based on their ending rather than starting addresses.
"ABSEND" makes a segment absolute end , forcing it to end at const_expr . Note the last actual byte of the segment will be placed at const_expr - 1 . That is, one byte before the address represented by const_expr . If label is present it is assigned the value of const_expr . The segment program counter is not changed.
"RELEND" makes a segment relative end , forcing it to end at the absolute start address of the following segment. Note the last actual byte of the segment will have an address one byte before the start address of that successor segment.
Any segment may be absolute end, but a relative end segment cannot be the last segment nor may it immediately preceed a relative origin segment.
"ABSEND" and "RELEND" may only be used within a segment fragment. They must be used before the segment program counter has changed, but can be used any number of times after that. The const_expr of "ABSEND" must have the same value each time.
Neither "ABSEND" nor "RELEND" can be used in any segment which already has another type. They cannot be padded .
Addresses in "ABSEND" and "RELEND" segments are always relative during the first pass. The absolute addresses of the data they contain cannot be determined until their final sizes are known. Thus references to locations and labels within them are always forward references .
Examples
ERROR
[opt_str]
The "ERROR" pseudo op enables programmer-triggered error messages.
These messages are the same in every respect as their internally triggered equivalents. They are sent to stdout and also to to the error file if one has been specified. The "ERROR" pseudo op also contributes to the running total of such messages necessary to halt assembly altogether.
The optional opt_str argument can be used to provide a more detailed description of what triggered the message. In particular, a string expression may be used to display the value(s) of one or more labels at the point of the error.
While there are no hard and fast rules as to which message type is appropriate when, HXA itself generally uses these:
Examples
Numeric Label
label
EQU
const_expr
String Label
label$
EQU
const_expr$
Variants
The "EQU" pseudo op examines the type of the label and evaluates an expression matching that type. The result is assigned to the label.
Global and local labels can be assigned the same value any number of times. Assignment of a different value to an existing label is not allowed unless the label is variable .
Examples
EXIT
EXITIF
conditional_expr
The "EXIT" pseudo op unconditionally stops HXA from processing the remainder of the closest enclosing macro , repeat , or while block expansion.
The "EXITIF" pseudo op is a conditional version of "EXIT". It behaves as "EXIT" if conditional_expr is TRUE, otherwise it is ignored and processing of the current block expansion continues.
"EXIT" can also be made conditional by using it within a conditional assembly block which is itself within the body of the currently expanded block.
Both "EXIT" and "EXITIF" are legal only within block expansions.
HXA always recognizes any label attached to the pseudo op marking the end of the block whether or not processing of the block is stopped early.
Examples
FATAL
[opt_str]
The "FATAL" pseudo op enables programmer-triggered fatal error messages. Assembly halts immediately.
These messages are the same in every respect as their internally triggered equivalents. They are sent to stdout and also to to the error file if one has been specified.
The optional opt_str argument can be used to provide a more detailed description of what triggered the message. In particular, a string expression may be used to display the value(s) of one or more labels at the point of the error.
While there are no hard and fast rules as to which message type is appropriate when, HXA itself generally uses these guidelines .
Examples
--FILE
[filename]
Variants
"ERRFILE" sends error messages to a text file in addition to stdout . This is particularly useful if there are so many error messages they cause the screen to scroll.
"HEXFILE" creates a text file containing the object code translation in Intel Hexadecimal Object File format. The file must be further processed by a loader program before it can be executed.
"LISTFILE" creates a text file containing both source code and a text representation of the object code produced. This pseudo op is useful for experimentation as well as documentation purposes.
"OBJFILE" produces a raw binary file containing the object code translation of the source code. This pseudo op must be used in order to obtain directly executable code.
If no filename is provided, the "--FILE" pseudo ops create a default name based on the name of the first source file encountered. They replace the filename extension (if any) with new extensions.
| Pseudo Op | Replacement |
| ERRFILE | ERR |
| HEXFILE | HEX |
| LISTFILE | LST |
| OBJFILE | OBJ |
Note that only the filename extension is replaced, so that the rest of the path and filename remain as they are. Thus by default the "--FILE" pseudo ops place their output in the same directory the first source file resides in. By providing these pseudo ops with a path and filename, output files can be placed in any directory with any name.
"--FILE" pseudo ops may appear any number of times in the source code. However if any subsequent use specifies a different filename than the first use, HXA issues a warning message and ignores the new filename.
Examples
[label]
HEX
opt_str
[[, opt_str]..]>
The "HEX" pseudo op stores byte values specified by opt_str into the object code. Each opt_str must contain a even number of Ascii hexadecimal characters (0-9A-Fa-f). Each pair of characters is interpreted as one byte value to be stored. If an opt_str contains an odd number of characters or any non-hexadecimal characters an error is reported.
Typically each opt_str is an "as-is" value, that is, no delimiters or other string expression markers are used that force expression evaluation. Thus the values to be stored can be determined by inspection. However any string expression may be used, as long as the result contains an even number of Ascii hexadecimal characters.
Although an opt_str may any length, using commas to separate multiple opt_strs can make inspection easier.
"HEX" processes each opt_str from left-to-right, so byte values are stored into the object code in the same order they appear in each successive opt_str .
"HEX" is intended mainly as a convenience for entering binary data by eliminating the need to specify the radix of each byte value.
Examples
IF
conditional_expr
[[source code]..]
[
ELSEIF
coditional_expr
]
[[source code]..]
[
ELSE
]
[[source code]..]
ENDIF
A conditional assembly block consists of an unnamed group of zero or more source code lines delimited by a matched pair of "IF".."ENDIF" pseudo ops. The grouped source lines are called the body of the conditional assembly block.
Evaluation of the conditional_expr following the "IF" pseudo op controls whether or not HXA processes the body of the conditional assembly block. If it is TRUE, the body is processed. If it is FALSE, the body is skipped.
The body of a conditional assembly block may be subdivided into branches by using the optional "ELSEIF" and "ELSE" pseudo ops. At most only one branch of a conditional assembly block is ever processed.
Any number of "ELSEIF" pseudo ops may be used in a conditional assembly block to evaluate alternate conditions. The conditional_expr following "ELSEIF" is evaluated only if no previous branch has been processed, otherwise it and the entire branch are skipped. If conditional_expr evaluates as true, the branch is processed and any following branches are skipped.
The first "ELSE" pseudo op encountered allows processing of its branch only if no previous branch has been processed. After the first "ELSE" at least one branch has always been processed, so any following branches are always skipped Thus there is usually at most one "ELSE" controlling only the last branch of a conditional block.
Conditional assembly blocks may be nested to any depth by directly nesting "IF".."ENDIF" pairs.
Examples
INCBIN
filename [[,const_expr1] , const_expr2]
The "INCBIN" pseudo op causes HXA to begin reading raw data from filename , which is assumed to contain arbitrary binary data to be stored directly into the object code without any processing.
By default HXA reads all of filename into the object code. If const_expr1 is specified and is a positive integer, it represents the maximum number of bytes to read. Reading stops when const_expr1 bytes have been read or the end of file is reached, whichever occurs first. If const_expr1 is less than one, the entire file is read.
By default HXA begins reading at the start of filename , which is at byte offset zero. If const_expr2 is specified and is a positive integer, reading begins at byte offset const_expr2 , which is byte const_expr2 + 1 of the file. Reading continues until const_expr1 bytes have been read or the end of the file is reached, whichever occurs first. If startpos is less than one reading begins at the start of filename .
Note that const_expr2 cannot be specified unless const_expr1 has been also. However const_expr1 can be a negative value, which effectively forces the entire remainder of the file to be read.
Examples
INCLUDE
filename
The "INCLUDE" pseudo op causes HXA to begin reading source code lines from filename , which is assumed to be a text file containing assembly language source code. When all lines of filename have been read, HXA resumes reading input lines from the file that contained the "INCLUDE".
"INCLUDE" may not be used within a macro, repeat or while expansion.
An included file has its own local scope. Local labels in an include file will not conflict with labels of the same name in other source files.
File inclusion may be nested to any depth by using the "INCLUDE" pseudo op within an included file.
Because HXA accepts only one filename on its command line, multiple source files can be assembled together only by using this pseudo op within that file.
Examples
LIST--
[flagname] [[,flagname]..]
Variants
The "LIST--" pseudo ops enable or disable various listing options used by HXA to create a listfile . They also influence which source lines HXA saves during the first pass, which is done in order to create a listfile without re-reading and re-processing the entire source.
"LIST--" pseudo ops can appear any number of times in the source code file(s). Note that when a listing begins, the listing flags are in the state set by their last use in the source code.
A listing file has up to four major sections: object , labels , segments and statistics .
Object Section
The object section lists all or part of the source code together with a text representation of the object code produced. The OBJECT flag controls both whether or not there is an object section at all and, if there is, whether or not particular portions of the source are listed.
The default state of the OBJECT flag is ON. In this state there is an object section, and every source line which results in object code is listed regardless of the state of any lower-priority object listing flag.
If the OBJECT flag is turned OFF at any point in the source code, there will be no object listing from that point until it is turned ON again.
If the OBJECT flag is OFF at the end of assembly there will be no object section listing at all.
| Flag Name | Lists | Default State |
| OBJECT | Source lines which produce object code | ON |
| SOURCE | Source lines which do not produce object code | ON |
| INCLUDES | Contents of "INCLUDE" files | ON |
| MACROS | Macro expansions | OFF |
| UNTAKEN | Untaken conditional branches | OFF |
Notes
Labels Section
The labels section lists some or all of the symbols defined during assembly. The state of the LABELS flag controls whether or not a labels section is produced. If the LABELS flag is OFF when assembly ends, then no labels section is produced.
If the LABELS flag is ON when assembly ends (the default), then every global and variable label name is listed together with a count of how often it was referred to and its value at the end of assembly.
| Flag Name | Lists | Default State |
| LABELS | Global and variable labels | ON |
| AUTOS | Internal form of local and branch target labels | OFF |
Segments Section
The segments section can appear only if a segmented source program is used. It lists the named segments in the order they appear in the object code, together with their type, start and end addresses, sizes and padding (if any).
The state of the SEGMENTS flag controls whether or not a segment map is produced for a segmented program. If it is ON (the default) the map is produced.
Statistics Section
The statistics section lists miscellaneous data concerning the current assembly. The state of the STATS flag controls whether or not a statistics section is produced. If the STATS flag is OFF when assembly ends (the default), then no statistics section is produced.
If the STATS flag is ON when assembly ends, HXA currently reports:
Global Flags
Global flags affect all flags and/or sections.
The ALL flag sets every other flag to ON or OFF, depending on which pseudo op it follows.
The LINENUMS flag is OFF by default. If it is ON at the end of assembly, every line of any listing is consecutively numbered at its start.
| Flag Name | Affects | Default State |
| ALL | every other flag | N/A |
| LINENUMS | line numbers | OFF |
Alphabetical Ordering of Listing Control Flags
| Name | Section | Type | Default State |
| ALL | All | Global | N/A |
| AUTOS | Labels | Secondary | OFF |
| INCLUDES | Object | Secondary | ON |
| LABELS | Labels | Master | ON |
| LINENUMS | All | Global | OFF |
| MACROS | Object | Secondary | OFF |
| OBJECT | Object | Master | ON |
| SOURCE | Object | Secondary | ON |
| STATS | Statistics | Master | OFF |
| UNTAKEN | Object | Secondary | OFF |
Examples
Preferred Useage
MACRO
name$
[[, ?arg[=default]]..]
[[source code]..]
[label]
ENDMACRO
Alternate Useage
name
MACRO
[?arg[=default] [[, ?arg[=default]]..]]
[[source code]..]
[label]
ENDMACRO
Variants
A macro consists of a named group of zero or more source code lines delimited by a matched pair of "MACRO".."ENDMACRO" pseudo ops. The grouped source lines are called the body of the macro.
A "MACRO".."ENDMACRO" pair is used to define a macro. Once a macro has been defined, use of its name in the opcode field of a source code line causes macro expansion . This has the effect of inserting the body of the macro into the source code at that point.
The preferred form of macro definition permits string expressions to denote name$ , but the alternate form requires a literal name . Macro names must be unique. No two macros can have the same name at the same time, nor can a macro have the same name as an existing label or opcode.
Note that if "ENDMACRO" is labeled, the label is considered to be within the body of the macro. It is duplicated each time the macro is expanded, hence global labels are discouraged here.
Macros are an advanced assembler function. They are not necessary, but whenever a programmer uses similar sequences of source code in different places, macros can help make that code clear, correct and compact.
Macro Arguments
Macros can optionally be (and usually are) defined to accept arguments . Arguments "customize" a macro by allowing its body to be altered each time it is expanded.
There are two kinds of arguments, formal and actual . Both kinds appear in the expression field following a macro name. If a macro has more than one argument, they are separated by commas.
Formal arguments are used during macro definition. Formal text arguments names have the same form as global labels , except their type is indicated by an initial question mark ('?') character. Within the body of the macro, formal arguments may be freely used in the expression field of any source line. To use formal arguments in the label or opcode fields, the "ONEXPAND" pseudo op must be employed.
Actual arguments are used during macro expansion. The first actual argument text will replace the first formal argument name wherever it appears in the macro body ( including within quoted string literals), the second actual will replace the second formal, and so on.
Replacement is performed as a straight text substitution. There is no other interpretation or understanding of the arguments. In particular, escape sequences in actual arguments are not processed until the expanded line is assembled.
The number of actual arguments provided to a macro expansion must always match the number of formal arguments declared during macro definition.
Default Actual Arguments
A default actual argument is declared during macro definition using the equality idiom (except only literal values, not string expressions, are recognized here):
During macro expansions default actual arguments will be substituted for any formal arguments which have them and for which no other actual argument has been provided.
Note that because blank sub-fields are not allowed in the expression field it is not possible to skip over default actual arguments. Thus,
Variable and Local Label Formal Arguments
In addition to text formal arguments, variable and local labels may be used as formal arguments in macro definitions. Like text formal arguments, variable and local label formal arguments may have default actual arguments.
At the start of each macro expansion the actual argument of each variable or local formal argument is treated as an expression. It is evaluated and the result assigned to the corresponding variable or local formal argument.
This is equivalent in every way to an "EQU" pseudo opcode within the body of a macro using a variable or local label on the left-hand side and a formal text argument on the right. It is essentially a short-hand method of accomplishing the same task.
At expansion time variable and local labels used as formal arguments are assigned actual argument evaluation results in the same left-to-right order they appear in the macro definition.
Thus this form of definition:
MACRO mymacro, ?myarg1, ?myarg2, ?myarg3 ]myvar1 EQU ?myarg1 @myvar2 EQU ?myarg2 ]myvar3$ EQU "?myarg3$" ... ; the remainder of the body ENDMACRO
and this form:
MACRO mymacro, ]myvar1, @myvar2, ]myvar3$ ... ; the remainder of the body ENDMACRO
are completely equivalent.
Macros and Local Scopes
Every macro expansion causes a nested local scope to be created without ending the existing one. Local labels with the same name as others outside the new local scope do not conflict with those names. Local scopes can be nested to any depth by nesting macro expansions. As each expansion ends, so does the nested local scope it created.
Macro Nesting
Macro definitions can be nested in two different ways. First, "MACRO".."ENDMACRO" pairs may be directly nested during definition. Second, a macro body may contain the name of another macro to be expanded.
Note that these two forms have slightly different expansion behavior. When expanding an outer definition of the first form, the delimiters of any inner definition are simply ignored (note this same principle prevents a macro definition from being recognized within repeat and while blocks). Because only one macro definition is expanded, only one new local scope is created.
Expanding the second form causes a new local scope to be created for each macro expansion invoked.
Examples
MAX--
const_expr
Variants
The "MAX--" pseudo ops set limiting values on various internal counters. If a limit is exceeded assembly halts.
Const_expr can be any value, although negative values generate a warning.
| Name | Limits | Default Maximum |
| MAXDEPTH | Local Scope Nesting Depth | 128 |
| MAXERR | Error Messages Reported | 25 |
| MAXWARN | Warning Messages Reported | 50 |
Examples
MESGTEXT
ndx=mesg
[[, ndx=mesg]..]
All HXA-originated messages displayed to the user are created based on an index into a table of assembler message texts .
The "MESGTEXT" pseudo op assigns arbitrary text to an index using the equality idiom . HXA's default messages can thus be replaced by user-chosen messages.
Ndx must match an internal HXA message index exactly (ie., the match is case-sensitive).
All characters in mesg outside the range $20-$7E are converted to a printable hexadecimal representation before assignment to ndx .
Examples
ONEXPAND
[text]
HXA normally performs a minimal amount of processing on every source code line in order to identify any pseudo ops. The "ONEXPAND" pseudo op can be used within the bodies of macro , repeat and while block definitions to prevent any assembler processing of a source code line until expansion occurs.
"ONEXPAND" allows use of formal macro arguments in the label and opcode fields of a source line. It can also be used to permit forward reference to a macro which is not yet defined (including recursive macro definitions).
Examples
[label]
ORG
const_expr
Variants
The "ORG" pseudo op evaluates const_expr and assigns the result to the program counter and, if present, to label .
HXA uses the program counter to determine address information for assembled code and data. Every source code line that generates code or data automatically updates the value of the program counter by the size of that code or data.
The program counter has no default value. It must be explicitly set by using an "ORG" pseudo op anywhere before the first code- or data-generating source line is encountered.
Every time the value of the program counter changes, it is checked against the legal range of values it may assume for the current CPU . Any attempt to generate code or data outside the legal range causes an error. There is no "wrapping" of program counter values from highest to lowest addresses or vice versa.
Note that because "ORG" itself changes the value of the program counter, the current CPU must be specified before "ORG" can be used.
In monolithic source code there is no restriction on how often "ORG" may be used, nor is there any on whether the program counter may assume the same value more than once.
In segmented source code "ORG" marks a segment as absolute origin . It may be used any number of times in the same segment. However every use in the same segment must have the same value each time, and only the first use actually sets the program counter. The program counter can only increase in value, and never repeats.
"ORG" cannot be used in any segment which already has another type.
Absolute origin segments are the only segment type in which backward references are possible during the first pass. Constant expressions may refer to locations and labels which are already known at the time the expression is evaluated.
Examples
PADTO
const_expr1 [, const_expr2 ]
The "PADTO" pseudo op inserts bytes into the object code until the program counter is a multiple of const_expr1 . That is, if the remainder of dividing the value of the program counter by const_expr1 is non-zero, pad bytes are inserted until it is.
For example:
PADTO 2
inserts a zero byte into the output if the value of the program counter is odd, otherwise it does nothing.
Const_expr1 must have a value in the range one to the maximum value of the program counter plus one. If the remainder will not become zero until an even larger value, a fatal error will occur when the program counter goes out of range.
Note that if const_expr1 is larger than the value of the program counter at the time padding starts, it essentially represents an absolute address to pad to, since that is where the first zero remainder will occur.
By default "PADTO" inserts zero bytes into the object code. If the optional const_expr2 is supplied, that value is used instead. While const_expr2 is a four byte (32-bit) value, "PADTO" actually uses only the least significant non-zero bytes for padding. This allows the pad value to be one-, two-, three- or four-bytes long as desired.
It is often convenient to specify const_Expr2 as a hexadecimal literal, as padding bytes are stored in the same left-to-right order as they appear in such literals (regardless of CPU orientation).
In monolithic source "PADTO" cannot be used before the first "ORG" . After that it may be used any number of times with any argument values. Each time it is executed immediately. Any padding appears in the object code starting at the current value of the program counter.
In segmented source "PADTO" can be used only within fragments of absolute origin and relative origin segments. Any particular segment may use it any number of times, but it must have the same value(s) each time. Different segments may use different values for "PADTO". Padding is delayed until the end of the first pass. Any padding appears at the end of each padded segment in the object code.
Segments cannot be both padded and common .
Examples
PSALIAS
psop=alias
[[, psop=alias]..]
The "PSALIAS" pseudo op assigns user-chosen names to existing pseudo op names using the equality idiom . A user-chosen name can then be used in place of the original pseudo op whereever it is legal to do so.
Alias names have the same form as global labels . Aliases must be unique. No two aliases can have the same name, nor can an alias have the same name as an existing label or opcode.
Examples
[label]
PSNULL
i[<text1> [, text1]..]>
The "PSNULL" pseudo op simply ignores each and every label and expression field value it may be associated with.
"PSNULL" is meant to ease porting source code to HXA. Such code may contain one or more pseudo ops for which HXA has no built-in equivalent. If any can be safely ignored, they may be aliased to "PSNULL", which will allow HXA to effectively skip them whenever encountered.
Examples
READONCE
The "READONCE" pseudo op causes each file it is used in to be silently ignored by any future "INCLUDE" pseudo op. It is meant to ensure that a particular file is included only once in an assembly no matter how many times that file appears as the operand of an "INCLUDE".
"READONCE" can be used anywhere in a file, except that it must appear before any "INCLUDE" or "INCBIN" pseudo op used in that file.
"READONCE" prevents future inclusion based on the name (including any path) of the file currently being read. It cannot prevent at least one extra inclusion for each different path to the same file supplied to "INCLUDE".
Examples
[label]
RBIT--
num_expr [[,num_expr]..]
Variants
The "RBIT--" pseudo ops store part of an offset_value relative to the program counter into the object code. The offset value is calculated as:
offset_value = num_expr - ( program_counter + data_size )
Num_expr is a 32-bit signed integer value. Program_counter is the value of the program counter at the first byte of the stored data. Data_size is the number of bytes of data stored (the same as the number of bits divided by eight).
In other words, the value of the program counter at the location immediately following the stored data is subtracted from num_expr to create the actual value stored.
"RBIT08" stores the least significant 8 bits, "RBIT16-" the least significant 16 bits, and "RBIT24-" the least significant 24 bits.
The "RBIT--" pseudo ops perform two range checks before storing any values.
First, num_expr must be within the program counter range. That is, it must be greater than or equal to zero and less than 2^(pc_bits). For example, with a 16-bit program counter the maximum value of num_expr must be less than 2^16 (= 65536, or $10000).
Second, offset_value , considered as signed, must fit into the storage space allowed. Note offset_value is positive if num_expr is greater than program_counter + data_size , otherwise it is negative.
| Pseudo Op | Min Dec | Max Dec | Min Hex | Max Hex |
| RBIT08- | -128 | 127 | $000080 | $00007F |
| RBIT16- | -32768 | 32767 | $008000 | $007FFF |
| RBIT24- | -8388608 | 8388607 | $800000 | $7FFFFF |
By default storage of multi-byte values is in native CPU order, either least significant byte (LSB) first or most significant byte (MSB) first. The "R" suffix reverses the native order of stored bytes, so "RBIT24R" used on an LSB first CPU will store the MSB first in the object code.
The "RBIT--" pseudo ops require at least one numeric expression argument. Each separate expression is treated a separate value to be stored according to the specific "RBIT--" rules.
Examples
RELORG
The "RELORG" pseudo op makes a segment relative origin . When it is made absolute it will start at the ending address of the preceeding segment. That is, one past the last address actually used by that segment.
Use of "RELORG" is optional because it is the default segment type. If a segment program counter changes before any type is explicitly declared, that segment automatically becomes relative origin. However "RELORG" can make the intention explicit, and may also help prevent accidentally setting another type later.
A relative origin segment cannot be the first segment, nor may it immediately follow a relative end segment.
"RELORG" may only be used within a segment fragment. It is the only type that does not have to be used before the segment program counter changes. It may be used any number of times in the same segment.
"RELORG" cannot be used in any segment which already has another type.
Examples
[label]
REPEAT
const_expr
[[source code]..]
[label]
ENDREPEAT
Variants
A repeat block consists of an unnamed group of zero or more source code lines delimited by a matched pair of "REPEAT".."ENDREPEAT" pseudo ops. The grouped source lines are called the body of the repeat block.
A "REPEAT".."ENDREPEAT" pair is used to define a repeat block. Because they are unnamed (and thus cannot be referred to later), repeat blocks are expanded as soon as their definition is complete. The expansion has the effect of duplicating the body of the repeat block const_expr number of times into the source code at that point. If const_expr is less than one, the body is skipped and there is no effect.
Repeat blocks are an advanced assembler function. They are not necessary, but are particularly helpful for building tables of data.
Repeat Blocks and Local Scopes
Repeat blocks may be nested to any depth by directly nesting "REPEAT".."ENDREPEAT" pairs.
Every individual repeat expansion block causes a nested local scope to be created without ending the existing one. Each repeat expansion block creates only one local scope no matter how many times the repeat body is duplicated. As each expansion ends, so does the nested local scope it created.
Note that the const_expr controlling any repeat block is not evaluated until that block is expanded. This allows the control expression of a nested repeat block to depend on values which are set only during the expansion of a nesting block.
Repeat blocks are similar to while blocks , except with a fixed rather than conditional control expression.
Examples
[label]
SBIT--
num_expr [[,num_expr]..]
Variants
The "SBIT--" pseudo ops store part of num_expr into the object code. Num_expr is a 32-bit signed integer value. "SBIT08" stores the least significant 8 bits, "SBIT16-" the least significant 16 bits, and "SBIT24-" the least significant 24 bits.
The "SBIT--" pseudo ops generate an error if num_expr , considered as an signed value, will not fit into the storage space allowed.
| Pseudo Op | Min Dec | Max Dec | Min Hex | Max Hex |
| SBIT08- | -128 | 127 | $000080 | $00007F |
| SBIT16- | -32768 | 32767 | $008000 | $007FFF |
| SBIT24- | -8388608 | 8388607 | $800000 | $7FFFFF |
By default storage of multi-byte values is in native CPU order, either least significant byte (LSB) first or most significant byte (MSB) first. The "R" suffix reverses the native order of stored bytes, so "SBIT24R" used on an LSB first CPU will store the MSB first in the object code.
The "SBIT--" pseudo ops require at least one numeric expression argument. Each separate expression is treated a separate signed value to be stored according to the specific "SBIT--" rules.
Examples
SEGMENT
name$
[[source code]..]
ENDSEGMENT
[name$]
Variants
A segment fragment consists of a named group of zero or more source code lines delimited by a matched pair of "SEGMENT..ENDSEGMENT" pseudo ops. The grouped source lines are called the body of the segment fragment.
A "SEGMENT..ENDSEGMENT" pair identifies a segment fragment as belonging to a particular named segment . Any number of fragments can have the same name, and all belong to the same segment.
If an object file is created, the named segments will be examined one after another in the same order they were first encountered in the source code. All the fragments of a named segment which contain data will be output one after another in the same order they were first encountered in the source code. This brings all the fragments of the same segment together as a single block in the object file. Thus the physical order of the object code may be different from the physical order of the source code which created that object.
Organizing source code by use of segments is optional. By default HXA treats source code as monolithic (one block). Object code appears in the same order as the source code which produces it.
Although in principle there is nothing to stop an assembly language program from freely mixing together executable code, constant data and variable locations in the object code, in practice these are usually separated into separate memory blocks for both safety and consistency. To accomplish this with monolithic source code often means placing constant data and variable locations far away from the executable code that will use them.
Segments offer the advantage that in the source executable code can be placed near the constant data and variable locations it will use, and in the object code these will all be cleanly separated from each other. The main disadvantage is that segments require explicit management.
Selecting Monolithic or Segmented Source Code
The decision to use a monolithic or segmented source code organization is an all-or-nothing choice. HXA does not permit the two approaches to be mixed. Users should choose whichever form makes their task easiest.
If the "ORG" pseudo op is used before and outside of any segment fragment, the source is considered monolithic. After this, no segment fragments will be allowed.
If a segment fragment appears before the first "ORG", the source is considered segmented. The program counter is then invalid outside of any segment fragment, and every "ORG" used must appear within a segment fragment.
Segment Types
HXA recognizes four types of segments: absolute origin , relative origin , absolute end and relative end . Any of these types may also be flagged as common segments, which do not hold any data and are used only for variable storage.
Every segment has its own program counter. Each segment program counter is the same size as the "CPU" program counter. Unlike a monolithic program counter, segment program counters can only increase in value. They can never be reset to go down in value or repeat themselves.
The default segment type is relative origin. All other types must be explicitly declared. Although it is usually convenient that this declaration appears in the first fragment of a particular segment, it is not necessary. The declaration may appear in any fragment as long as the segment program counter has not yet changed. That is, before any code or data has been stored or the "DS" pseudo op has been used.
There must be at least one absolute segment, which can be either absolute origin or absolute end. If there is only one it can be preceeded only by relative end segments, and can be followed only by relative origin segments.
Segments and Forward Reference
If a segment is not absolute origin, the only labels within it which have a value known during the first pass are those defined by "EQU" . The value of all other labels in such segments depends directly or indirectly on the segment program counter, which during the first pass represents an offset rather than an absolute address. Hence in expressions all use of these labels, or the segment program counter itself, are forward references .
In general this does not cause any difficulty, as HXA resolves (or forbids) these references as it does all forward references. However because all references to labels defined in non-absolute origin segments are forward, some expressions which work perfectly well in monolithic programs will fail in segmented programs.
For example, in a monolithic program the value of any label which has appeared in the label column of a source line is available for use in any expression context. But in non-absolute segments, similar labels can be used in constant expressions only if they were part of an "EQU" assignment.
Segments and Local Scopes
Segment fragments may be nested to any depth by directly nesting "SEGMENT".."ENDSEGMENT" pairs. It is legal for a fragment to nest inside another fragment of the same segment.
Every segment fragment causes a nested local scope to be created without ending the existing one. As each fragment ends, so does the nested local scope it created.
Monolithic vs. Segmented Source Code
| - | Monolithic | Segmented |
| Max# Segments | 1023* | 1023 |
| Max# Segment Fragments | n/a | unlimited |
| Listing File | no segment map | default segment map |
| "DS" psop | value can be negative | value cannot be negative |
| "END" psop | can be labelled | cannot be labelled |
| "ORG" psop | any value any time | one value per absolute segment |
| "PADTO" psop | any value any time; immediate execution; pads from current pc value | one value per segment; delayed execution; pads at end of segment |
Notes:
Examples
[label]
STRING--
const_expr$|const_expr [[,const_expr$|const_expr]..]
Variants
The "STRING--" pseudo ops store characters into the object code. Each character is mapped through the current character set translation before being stored.
At least one argument is required. It must be a constant expression, but can be either string or numeric. Numeric arguments are coerced to one-character strings by the equivalent of CHR$(const_expr & $FF) .
If more than one argument is provided, all are concatenated together as one string before storage.
Object bytes are stored consecutively starting at the current program counter location. "STRING" proceeds from left to right through its (concatenated) argument(s). That is, in the order they appear as arguments. "STRINGR" processes its (concatenated) argument(s) from right to left. That is, in reverse order.
If the result of concatenation is null HXA issues a warning and no object code is created.
Note that some assemblers allow mixed string and numeric expressions to follow "BYTE" (or equivalent) pseudo ops. HXA does not permit this for "BIT--" , but does for "STRING".
Examples
--TIMER
name$
Variants
The "--TIMER" pseudo ops provide programmer access to the same start, stop and elapsed time functions HXA uses to measure its own pass times. User-named timers allow comparisons of the assembly times of alternative program constructs to be made.
Names must be unique within all timers. No two timers can have the same name.
Timers are accurate only to the nearest second.
"STARTTIMER" creates a new timer and records the current time. This pseudo op may be used only once for any given timer.
"STOPTIMER" stops the named timer and records the current time. This pseudo op may be used only once for any given timer.
"SHOWTIMER" echoes the value of the named timer in "HH:MM:SS" format. If the timer is stopped, the value is the difference between its start and stop times. If the timer is not stopped, the value is the difference between its start time and the current time. This pseudo op may be used any number of times with any given timer.
Examples
[label]
UBIT--
num_expr [[,num_expr]..]
Variants
The "UBIT--" pseudo ops store part of num_expr into the object code. Num_expr is a 32-bit signed integer value. "UBIT08" stores the least significant 8 bits, "UBIT16-" the least significant 16 bits, and "UBIT24-" the least significant 24 bits.
The "UBIT--" pseudo ops generate an error if num_expr , considered as an unsigned value, will not fit into the storage space allowed.
| Pseudo Op | Min Dec | Max Dec | Max Hex |
| UBIT08- | 0 | 255 | $0000FF |
| UBIT16- | 0 | 65535 | $00FFFF |
| UBIT24- | 0 | 16777215 | $FFFFFF |
By default storage of multi-byte values is in native CPU order, either least significant byte (LSB) first or most significant byte (MSB) first. The "R" suffix reverses the native order of stored bytes, so "UBIT24R" used on an LSB first CPU will store the MSB first in the object code.
The "UBIT--" pseudo ops require at least one numeric expression argument. Each separate expression is treated a separate unsigned value to be stored according to the specific "UBIT--" rules.
Examples
UNDEF
name$
The "UNDEF" pseudo op deletes name$ from the current set of macros known to HXA.
It is not an error to delete a macro which is not currently defined.
Examples
WARN
[opt_str]
The "WARN" pseudo op enables programmer-triggered warning messages.
These messages are the same in every respect as their internally triggered equivalents. They are sent to stdout and also to to the error file if one has been specified. The "WARN" pseudo op also contributes to the running total of such messages necessary to halt assembly altogether.
The optional opt_str argument can be used to provide a more detailed description of what triggered the message. In particular, a string expression may be used to display the value(s) of one or more labels at the point of the warning.
While there are no hard and fast rules as to which message type is appropriate when, HXA itself generally uses these guidelines .
Examples
[label]
WHILE
conditional_expr
[[source code]..]
[label]
ENDWHILE
Variants
A while block consists of an unnamed group of zero or more source code lines delimited by a matched pair of "WHILE".."ENDWHILE" pseudo ops. The grouped source lines are called the body of the while block.
A "WHILE".."ENDWHILE" pair is used to define a while block. Because they are unnamed (and thus cannot be referred to later), while blocks are expanded as soon as their definition is complete. Conditional_expr is evaluated, and if it is TRUE the body of the while block is duplicated into the source code at that point. If FALSE, the body is skipped and there is no effect.
Conditional_expr is re-evaluated each time the body has been completely processed by the assembler. As long as the result is TRUE the body is duplicated once more.
While blocks are an advanced assembler function. They are not necessary, but are particularly helpful for building tables of data.
While Blocks and Local Scopes
While blocks may be nested to any depth by directly nesting "WHILE".."ENDWHILE" pairs.
Every individual while expansion block causes a nested local scope to be created without ending the existing one. Each while expansion block creates only one local scope no matter how many times the while body is duplicated. As each expansion ends, so does the nested local scope it created.
Note that the conditional_expr controlling any while block is not evaluated for the first time until that block is expanded. This allows the control expression of a nested while block to depend on values which are set only during the expansion of a nesting block.
While blocks are similar to repeat blocks , except with a conditional rather than fixed control expression.
Examples
XLATE
code=value
[[, code=value]..]
All character codes that appear as arguments of a STRING-- pseudo op or XLATE() function are subject to translation . Character code values in the range zero to 255 are re-mapped to arbitrary values in the range zero to 255 by means of a translation table . Translation permits use of the ASCII character set in a program destined for a non-ASCII environment.
The default translation table simply maps each code value to itself. The "XLATE" pseudo op alters the translation table.
"XLATE" uses the equality idiom to specify how character codes are to be re-mapped. There are several different forms of "XLATE" arguments depending on how many character codes are to be re-mapped at one time.
Map One New Value onto One Character Code
Both code and value are either single printable characters or escape sequences . If expressed as a printable character, the ASCII value of the character is used.
Map One New Value onto a Range of Character Codes
A range is expressed as a character code, followed by a dash ('-'), followed by a character code.
HXA will issue a warning if the end of any range is less than its start, and nothing will happen.
Map a Range of New Values onto a Range of Character Codes
HXA does not complain if the two ranges are not equal in extent. If the code range is smaller, re-mapping stops when the end of the code range is reached. If the value range is smaller, its last value is repeated until the code range is complete.
Map A Range of New Values onto One Character Code
This form is legal but merely maps begvalue onto code and stops. It is equivalent to the first form listed.
Examples