Written by: David J. Ruck
Date: 03-Oct-2021
Copyright © 2000-2021, DEEJ Technology PLC.
All Rights Reserved.
ARMalyser is a tool designed to analyse RISC OS executables providing identification of code and data areas with detailed comments, and facilities for turning executables back in to ObjAsm compatible assembler source. It identifies instructions that may have side effects in 32 bit processor modes, to aid in porting 32bit RISC OS variants.
It has built in knowledge ARM Architectures up to ARMv5TE, ARM procedure call standards, executable, object and library file formats and RISC OS SWI calls. Output can be in the form of disassembly, ObjAsm style assembly and statistical summary. Options are provided to format the output as text, HTML, XML, or fully customisable variants of those and most other textual tagged document formats.
ARMalyser is available for RISC OS™, Win32™, x86 and ARM Linux, and other UNIX™ variants on request.
Armalyser handles the following file types:-
Squeezed absolute and raw code may be handled directly if the xpand utility is present on the run path. Similarly compressed modules can be handled if the unmodsqz utility is available.
Usage: ARMalyser [options] infile
Where options are:--h | Output command syntax | ||||
-v | Verbose output, progress reports are sent to stdout, code analysis warnings are given | ||||
-d | Output disassembly to stdout, | ||||
-a | Produce ObjAsm format assembly | ||||
-r[a|r] | Set register naming used :- | ||||
| |||||
-s | Print statistics on code construction and 26-bit only instructions stdout | ||||
-t <target> | Target processor ARM7 | ARM9 | StrongARM | XScale | ||||
-p<t|h|x> | Print format in Text (default), HTML or XML. If no option letter is supplied | ||||
-p <filename> | the format is taken from a messagetrans format file in the next argument | ||||
-xc <addr> | Display analysis backtrace when code marked | ||||
-xd <addr> | Display analysis backtrace when data marked | ||||
-xr | Display register contents during analysis | ||||
-ls | List embedded symbols | ||||
-o [filename] | Sent output to a file (uses stdout if not specified) |
If print formatting is specified the output is encoded with tags that can be used to provide syntax colouring and hyper-linking for display in web browsers or printing in word processors. Pain text, HTML and XML formats are provided as standard, additional formats can be used by specifying a messagetrans file containing the tokens shown below.
The standard HTML file is provided as a template, as well as an inverted variant and file for Impression DDF (also suitable for EasiWriter and TechWriter) and Ovation Pro DDL. Almost any additional textual document format may be produced, as long as formatting codes are contained in tags with defined start and end characters, and illegal characters can be escaped either with defined start and end characters or a fixed length sequence. Note however that characters apart from " ' < >{ | } & may appear unescaped in comments in the current version.
Messagetrans token | HTML default | XML default | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TagStart | < | < | Tag start character | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TagEnd | > | > | Tag end character | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DOC1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">\n <HTML>\n<HEAD>\n <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n <META NAME="Generator CONTENT="%s">\n <TITLE>%s</TITLE>\n <style type="text/css">\n<!--\n body { background-color: #FFFFFF; color: #000000; }\n DIV.code { font-family: monospace; white-space: pre; }\n A:link { color: #BF3F00; }\n A:vistited { color: #BF3F00; }\n A.active { color: #BF0000; }\n A.a { color: #3F3F3F; }\n A.l { color: #BF0000; }\n TABLE { border-style: outset; border-width: 1px; }\n TH { border-style: inset; border-width: 1px; padding: 5px; text-align: center; font-size: large; font-weight: bold; }\n TD { border-style: inset; border-width: 1px; padding: 5px; text-align: right; }\n TD.rh { text-align: left; font-weight: bold; }\n DIV.w { background-color: #FFFF00; color: #FF0000; }\n SPAN.b { color: #00007F; }\n SPAN.x { color: #FF0000; }\n SPAN.t { color: #FF7F00; }\n SPAN.m { color: #5F5F5F; }\n SPAN.d { color: #3F00BF; }\n SPAN.n { color: #00007F; }\n SPAN.f { color: #7F007F; }\n SPAN.r { color: #003FBF; }\n SPAN.h { color: #003F00; }\n SPAN.i { color: #7F7F00; }\n SPAN.u { color: #007F7F; }\n SPAN.s { color: #007F00; }\n SPAN.ca { color: #00BF00; }\n SPAN.cb { color: #7FBF00; }\n SPAN.cc { color: #BFBF00; }\n SPAN.cd { color: #BF7F00; }\n SPAN.ce { color: #FF0000; }\n -->\n</style>\n </HEAD>\n | <?xml version="1.0" encoding="iso-8859-1"?>\n <!DOCTYPE ARMalyser SYSTEM "ARMalyser.dtd">\n <%s filename="%s">\n | Used at the start of document. The first %s code is replaced with the name of the tool, and the second with the filename it has been run on.
Note for XML the name if the tools is used as the top level tag, this must be ARMalyser for the DTD to function. The \n codes are used to put newlines in to the output | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DOC2 | \n</BODY>\n</HTML>\n | </%s> | Used at the end of the document. The %s is replaced with the name of the tool. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DISS1 | <DIV class="code"> | <Disassembly> | Marks the start of the disassembly section | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DISS2 | </DIV><HR> | </Disassembly> | Marks the end of the disassembly section | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DISSLINE1 | <DissLine> | Marks the start of a line of disassembly | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DISSLINE2 | </DissLine> | Marks the end of a line of disassembly | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ASM1 | <DIV class="code"> | <Assembly> | Marks the start of the assembly section | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ASM2 | </DIV><HR> | </Assembly> | Marks the end of the assembly section | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ASMLINE1 | <AsmLine> | Marks the start of a line of assembly | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ASMLINE2 | </AsmLine> | Marks the end of a line of assembly | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATS1 | <TABLE>\n | <Stats> | Start of the statistics table | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATS2 | </TABLE> | </Stats> | End of the statistics table | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSTITLE1 | <TR><TH COLSPAN=3> | <StatsTitle> | Start of the statistics table title row | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSTITLE2 | </TH></TR> | </StatsTitle> | End of the statistics table title row | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSLINE1 | <TR> | <StatsLine> | Start of a statistics table row | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSLINE2 | </TR> | </StatsLine> | End of a statistics table row | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSCOLUMNA1 | <TD class="rh"> | <StatsColumn column="1"> | Start of a statistics table column 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSCOLUMNA2 | </TD> | </StatsColumn> | End of a statistics table column 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSCOLUMNB1 | <TD> | <StatsColumn column="2"> | Start of a statistics table column 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSCOLUMNB2 | </TD> | </StatsColumn> | End of a statistics table column 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSCOLUMNC1 | <TD> | <StatsColumn column="3"> | Start of a statistics table column 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STATSCOLUMNC2 | </TD> | </StatsColumn> | End of a statistics table column 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_WARNINGLINE1 | <DIV class="w"> | <Warning> | Start of a warning line from analysis engine. Will appear before disassembly. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_WARNINGLINE2 | </DIV> | </Warning> | End of a warning line. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ADDRESS1 | <A NAME="%s%08X class="a"> | <Address address="%s%08X"> | Start of a disassembly address field. The %s is replaced with 'L' in assembly (empty in disassembly), the %08X is replaced with the hex value of the address. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ADDRESS2 | </A> | </Address> | End of a disassembly address field. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_LABEL1 | <A NAME="%s%08X" class="l"> | <Label address=%s%08X"> | Start of an assembly label field. The %s is replaced with 'L' in assembly (empty in disassembly). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_LABEL2 | </A> | </Label> | End of an assembly label field. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ADDRLINK1 | <A HREF="#%s%08X"> | <AddrLink address=\"%s%08X\"> | Start of an address hyperlink. Occurs around addresses in disassembly op codes or directives, labels in assembly op codes or directives, and in pointer indicators in comments. The %s is replaced with 'L' in assembly, empty in disassembly, so the hyper link refers to the correct section. The %08X is replaced with the hex value of the address. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_ADDRLINK2 | </A> | </AddrLink> | End of an address hyperlink. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_CHARS1 | <SPAN class="b"> | <Chars> | Start of the disassembly character display. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_CHARS2 | </SPAN> | </Chars> | End of the disassembly character display. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_CTRLCHAR1 | <SPAN class="x"> | <CtrlChar> | Start of a control character tag. Used around a control character in the character display field in disassembly. Note this is nested within the tag_CHARS section. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_CTRLCHAR2 | </SPAN> | </CtrlChar> | End of a control character tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_MEMORY1 | <SPAN class="m"> | <Memory> | Start of the disassembly memory field. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_MEMORY2 | </SPAN> | </Memory> | End of the disassembly memory field. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_INSTRUCTION1 | <Instruction> | Start of the instruction field in both disassembly and assembly. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_INSTRUCTION2 | </Instruction> | End of the instruction field. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_OPCODE1 | <OpCode> | Start of the op code mnemonic tag. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_OPCODE2 | </OpCode> | End of the op code mnemonic tag, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DIRECTIVE1 | <SPAN class="d"> | <Directive> | Start of the assembler directive tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_DIRECTIVE2 | </SPAN> | </Directive> | End of the assembler directive tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_CONDITION1 | <SPAN class="n"> | <Condition> | Start of the op code condition tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_CONDITION2 | </SPAN> | </Condition> | End of the op code condition tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_MODIFIER1 | <SPAN class="f"> | <Modifier> | Start of the modifier tag, used for multiple register transfer flags, write back indication, and floating point precision and rounding. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_MODIFIER2 | </SPAN> | </Modifier> | End of the modifier tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_REGISTER1 | <SPAN class="r"> | <Register> | Start of the register tag, used for main and floating point registers. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_REGISTER2 | </SPAN> | </Register> | End of the register tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_REGLIST1 | <RegisterList> | Start of the register list tag, used around the parenthesis in the multiple data transfer instruction. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_REGLIST2 | </RegisterList> | End of the register list tag. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_SHIFT1 | <SPAN class="h"> | <Shift> | Start of the op code shift tag, used around the shift type. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_SHIFT2 | </SPAN> | </Shift> | End of the op code shift tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_SWI1 | <SPAN class="i"> | <SWI> | Start of the SWI tag, used around the SWI name (or number if unrecognised). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_SWI2 | </SPAN> | </SWI> | End of the SWI tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_NUMBER1 | <SPAN class="u"> | <Number> | Start of the number tag, used around any decimal or hex value in an instruction. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_NUMBER2 | </SPAN> | </Number> | End of the number tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STRING1 | <SPAN class="s"> | <String> | Start of the string tag, used around string values in assembler directives or comments. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_STRING2 | </SPAN> | </String> | End of the string tag | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTA1 | <SPAN class="ca">; | <Comment> | Start of the Comment A tag. Used when the location has been identified with certainty. Note there is a trailing space in the HTML tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTA2 | </SPAN> | </Comment> | End of the Comment A tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTB1 | <SPAN class="cb">;~ | <Comment surmised="1"> | Start of the Comment B tag. Used when the location has been identified with high confidence. Note there is a trailing space in the HTML tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTB2 | </SPAN> | </Comment> | End of the Comment B tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTC1 | <SPAN class="cc">;~~ | <Comment surmised="2"> | Start of the Comment C tag. Used when the location has been identified with medium confidence. Note there is a trailing space in the HTML tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTC2 | </SPAN> | </Comment> | End of the Comment C tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTD1 | <SPAN class="cd">;~~~ | <Comment surmised="3"> | Start of the Comment D tag. Used when the location has been identified with low confidence. Note there is a trailing space in the HTML tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTD2 | </SPAN> | </Comment> | End of the Comment D tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTE1 | <SPAN class="ce">;? | <Comment unidentified="1"> |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tag_COMMENTE2 | </SPAN> | </Comment> | End of the Comment E tag. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EntityStart | & | & | Start of entity character, used before characters that are invalid in the format. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EntityEnd | ; | ; | End of entity character, or if 1 to 9, the number of characters after EntityStart to be skipped. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FileType | &FAF | &F80 | RISC OS file type to use for output file |
The HTML generated is complient with the HTML 4.01 final and CSS 1 specifications, and passes W3C validation. Additional format files for HTML 3.2 final without CSS are provided, but fail validation due to the use of font colour tags with in PRE blocks, but no problems have been found with this in any browser.
The HTML is compatible with all current RISC OS browsers, but the very large files produced may be slow to render and require a large amount of memory to be available to the browser. It should be noted that the various versions Oregano have a monospaced fonts which aren't quite monospaced, leading to uneven column alignment. NetSurf is recommended for adherance to HTML and CSS standards, and its superior performance.
A DTD is provided to enable validation and parsing of the output.
Whilst the DDF produced is thought to be correct, Impression Publisher and Publisher+ can have difficulties with certain lines due to the number of style changes. This is not alleviated by using effects instead of styles, and has been put down to a bug in these programs. The problem can be circumvented by removing some of the tags, so that lines contain less style changes.
The output produced will load and display perfectly in EasiWriter and TechWriter.
As DDL is not a true tagged format (instead requiring the document text to be enclosed quotes), the formatting width calculations do not work correctly and produce varying length instruction strings, but this is masked by using a tab in the closing instruction tag to ensure that following comments are aligned. Note a tab at the start of comments is not suitable, as they may start after the memory display in disassembly mode, where no instruction is present.
The entire ARMv4 instruction set is recognised and some elements of ARMv5.
A large amount of the potential instruction space is considered invalid for legitimate use by RISC OS applications, the following are rejected.
The following instructions may have unpredictable results on different processor variants.
The following instructions are not considered valid for use in 32-bit mode when found in 26-bit executables as the meaning of the instruction or its side effects are considerably different.
Use of the PSR manipulation instructions MSR and MRS are invalid on the ARM2 and ARM3 processors.
If the processor target is specified on the command line the analysis will gather information relating to performance issues. These include :-
ARM7If the instruction contains an immediate address, or a register previously loaded with a PC relative address can be located, the following information is extracted from the instructions.
Entry points are determined into the code by analysis of the AIF or module header, or the execution address for raw code. The code is then followed, stacking addition program flow changes and registered entry points using a stack. The following order is:-
Partial emulation of code is used to track registers to enable code and data areas to be detected, either directly or by knowledge of values passed to SWI calls and Shared CLib functions arguments. Values enter registers via one of:-
The following arithmetic operations are then emulated where all registers used by the instruction are known:-
Currently the PSR flag bits are not emulated so instructions that rely on the C flag such as ADC, SBC or with immediate shifts of RRX cannot be emulated, and invalidate the destination register.
If the instruction is conditional the register(s) set by the intruction will only be valid in subsequent intructions which also bear this condition, until flag alter instruction is encountered.
Register values are tracked through code sequences and are passed forward or reverse brnaches, any subroutines and SWI's. However if any of the following are encountered all known register are invalidaed.
Code only suitable for use in 26 bit modes, or instructions only present on the ARM 2/3, may appear in 32 bit programs if suitabled guarded by a test whether the procesor is running in 26bit mode. Similarly instructions only available on later ARM processors may be used if suitabled guarded by a test whether the procesor is running in a 32bit mode. ARMalyser recognises the following code constructions as guarding the use of 26bit only or 32bit only instructions:-
TEQ R0,R0 ; Ensure some flag bits set (only needed in USR mode code)
TEQ PC,PC ; Check if PC contains PSR, NE in 26bit mode, EQ in 32bit mode
; 26 bit only instructions safe if used with the NE condition
; 32 bit only instructions safe if used with the EQ condition
TEQ PC,#0 ; Ensure some flag bits set (only needed in USR mode code)
TEQ PC,PC ; Check if PC contains PSR, NE in 26bit mode, EQ in 32bit mode
BEQ in32
|in26|
; only executed in 26 bit mode, all 26 bit only instructions safe
B inEither
|in32|
; only executed in 32 bit mode, 32 bit only instructions safe
|inEither|
; executed in either mode
TEQ PC,#0 ; Ensure some flag bits set (only needed in USR mode code)
TEQ PC,PC ; Check if PC contains PSR, NE in 26bit mode, EQ in 32bit mode
BNE in26
|in32|
; only executed in 32 bit mode, 32 bit only instructions safe
B inEither
|in26|
; only executed in 26 bit mode, all 26 bit only instructions safe
|inEither|
; executed in either mode
Following the analysis of executable structure, code following and data recognition directly from code usage, additional steps must be taken to surmise the unidentified areas. There are 4 levels of recognition confidence:-
The following code structures are understood.
The following SWI's are recognised, and values or addresses set-up by proceeding instructions are used to enable data structures and handler functions to be identified.
The following data structures are identified and annotated in comments:-
C++ Symbol names are unmanaged for display in comments, courtesy of code provided by Robin Watts.
The disassembly output is similar to that produced by the RISC OS debugger module, but with a greater knowledge of the ARMv4 instruction set, and more comprehensive comments as a result of the code analysis.
The comment sequence describes the level of surmisation, or confidence of the analysis.
; | Positively identified |
;~ | Surmisation level 1 |
;~~ | Surmisation level 2 |
;~~~ | Surmisation level 3 |
;? | Failed to Identify |
Prefixed with CAUTION:
Bad Address - assumes 26bit wrapping | Instruction may assume branches will wrap around the 26 bit address boundary, in 32 bit mode will jump outside the bottom 64MB |
Bad Address - Thumb mode | Bit 2 of branch address set, which will result in thumb mode being select on later ARM's |
Bad Address - unaligned | Bit 1 of branch address set |
Not 32bit safe | Instruction does not have the same behaviour in 32-bit modes as in 26-bit modes |
Not 32bit safe (uses PSR) | PC used in Rd or Rm or LDM with {PC}^ may cause problems in 32-bit modes |
Not 32bit safe (uses NV) | Instruction uses the former NV conditional encoding, which is used to provide instruction set extensions on later ARM's |
Should be a NOP | Current instructions should be a NOP to prevent side effects from the previous insrtruction on some ARM variants |
Uses a banked register | Current instructions should not accessed a banked register, due to side effects from the previous insrtruction. |
SWI after CDP | SWI follows a coprocessor data instruction, which may cause problems on some ARM variants |
Conditional after BL/SWI | Conditional instructions used after subroutine or SWI (except conditions based on flags altered by the SWI and the V flag from flag altering subroutines), suggesting flag preservation has been assumed. |
Manipulation of PSR in address? | Instruction may be assuming a combined PC+PSR and is trying to maniplulate the PSR bits in what is otherwise and address |
Unpredictable - negative unindexed | Coprocessor data transfer unindex mode with a negative offset |
Unpredictable - write back to PC | LDR/STR, LDM/STM or coprocessor data transfer instruction writing back to the PC |
Unpredictable - base register in list and write back | Base register in list with writeback set for LDM/STM |
Unpredictable - ! and ^ | Use of writeback and load user regsiters with LDM/STM |
Unpredictable - PC with byte/half word | Program counter loaded or stored with a non word LDR/STR variant |
Unpredictable - write back with Rd=Rn | Write back to register also used as destinaton in LDR/STR |
Unpredictable - write back with Rn=Rm | Write back to register also used as index ion LDR/STR |
Unpredictable - write back used | Write back to register illegal in PLD |
Unpredictable - use of PC | PC used in a multiply or SDS instruction |
Unpredictable - Rd odd or R14 | Odd number register or R14 used in LDRD or STRD |
Unpredictable - non unique registers | Invalid use of same register more than once in a multiply or SDS instruction |
Unpredictable - StrongARM bug - next op exec'd twice | On a StrongARM a conditional MSR setting the control field causes the next instruction to be executed twice, so should be a NOP to prevent side effects. |
Unpredictable - immediate with non flag fields | A MSR instruction with immediate value setting the non flags fields may cause side effects due when altering values of currently reserved bits |
Unpredictable - SBZ non zero | Bits that should be zero in an instruction are set |
Unpredictable - SBO not ones | Bits that should be ones in an instruction are clear |
Unpredictable - Rm=PC | PC used as the Rm register which results architecture specific values |
Unpredictable - Cannot be conditionl | Some ARMv5 instructions such as BKPT cannot be executed conditionally |
Invalid Instruction | Invalid instruction found in code area |
Self Modifing | Write detected to area of code |
Enabled when the target processor is specified on the command line. Prefixed with PERF:
Conditional LDM/STM maybe slow | Conditional LDM's and STM's on StrongARM and XScale's are unrolled and take more then one cycle even if not taken, reducing code performance |
Single register LDM/STM slower than LDR/STR | Single register LDM's and STM's are slower than LDR's and STR#s on StrongARM and XScale. LDR Rd,[Rn],#4 or STR Rd,[Rn,#-4]! should be used in preference. |
n cycle latency on register | A register used in the current instruction will not be available for a number of cycles. For maximum performance code should be reordered so that other instructions which do not use this register, are inserted between the where the register is written and used. |
ARMvN | Instruction is only available the specified architecture number onwards. |
Guarded non ARMvN instruction | Instruction is not available on the ARM 2 or ARM 3 but is only executed if the processor is running in 26bit mode due to a previous code sequence |
Guarded not 32bit safe instruction | Instruction is not 32bit safe but is only executed if the processor is running in 32bit mode due to a previous code sequence |
Entry Point | Code entered directly from OS |
Function entry | Target of BL |
Label | Target of branch instruction or dynamic branch |
Ends | Flow of code ends at this point |
Dynamic branch | Manipulates PC to alter code flow |
(Referenced) | ADR or memory pointer references this location |
(Read as Data) | Code area is read by a data instruction |
=value | Decimal or ASCII representation of immediate value used in instruction |
Data comments consist of the following
[construct]<data type> <read write specifier> <array specifier> [pointer]
Where:-
Construct | If data has been identified as belonging to a construct |
Data type | One of the following:-
|
Read Write specifier | -/- Not directly accessed |
r/- Read from | |
-/w Written to | |
r/w Read and written | |
Array specifier | If accessed as an array via an index register |
Pointer | If data is valid offset or address, the data is dereferenced, indicated by -> |
Assembly produces an ObjAsm style output, but is not guaranteed to be immediately usable in ObjAsm. It contains the following elements:-
Gives the original executable file name, and the version of the tool used to produce the file, and compilation instructions.
Any named SWI calls used in code are declared at the start of the assembly.
Labels are constructed using the following fields where identified
Code: | L<Address>.[construct].[codeinfo | funcname] |
Data: | L<Address>.[construct].[datatype] |
Where:-
Address | 8-digit hex address |
Construct | Code or data construct |
Codeinfo | Information on entry point |
Funcname | Function name Identified from C symbol (not unmangled for C++) |
Datatype | Type of data |
The assembly output should be compiled with ObjAsm using the -ABSolute flag, and linked by Link with the -bin flag. This will produce output files of type Absolute (&FF8), so for modules or other types the file type should be set appropriately.
The following statistical information is gathered on the executable, and displayed both as the number of words, and percentage of the file.
Size in words | Total size of the executable. |
Code | Words identified as valid code. |
Surmised | Amount of code which was surmised as opposed to directly identified. |
Uses PSR | Uses the processor status register, which may cause problems in 32-bit modes. |
Not ARM2/3 | Instruction is not available on ARM2 or ARM3 processors. |
Not 32 bit | Instruction does have the same behaviour in 32-bit modes as in 26-bit modes. |
Unpredictable | Instruction does not produce predicable results on all ARM variants. |
Data | Words identified as data. |
Surmised | Amount of data which was surmised as opposed to directly identified. |
Warnings | Number of warnings produced by code analysis. |
Unidentified | Words that could not be identified as code or data. |
If a target processor is specified, the additional statistics are displayed.
Total Cycles | Sum of all the instruction cycle counts in the executable. |
Latencies | Sum of all the instruction and register latencies identified. |
0.01 08-Apr-2000 | Initial revision |
0.10 19-May-2000 | Alpha release |
0.11 23-May-2000 | 2nd Alpha release |
0.12 23-May-2000 | 3rd Alpha release |
0.13 29-Nov-2000 | 32 bit CLib support added |
LDR PC,[R13],#4 recognised as function exit | |
0.14 04-Dec-2000 | PullCall returns 0xFFFFFFFF if empty not 0 |
0.15 21-Apr-2001 | xpand used for unsqueezing absolutes |
unmodsqz used for squeezed modules | |
0.16 25-Apr-2001 | AOF and ALF file checking added |
0.17 23-Jul-2001 | SWI parameter checking greatly expanded |
0.18 10-Aug-2001 | HTML and XML formatting added |
0.19 14-Aug-2001 | Custom format loading added |
0.20 30-Aug-2001 | Debug data and C++ name unmangling added |
0.21 01-Sep-2001 | First Beta release |
Formatting fixes for DDF and DDL | |
0.22 27-Sep-2001 | Fixed length string emble display |
Wimp menu display fix | |
MessageTrans menu structure added | |
MarkData() and MarkString() will not override higher priority constructs | |
0.23 30-Oct-2001 | XML tags formalised with introduction of a DTD |
0.24 01-Jan-2002 | Code expected warning addresses fixed |
0.25 20-Jan-2002 | OBJ_AREA relocation directive aware of instruction type |
0.26 13-Feb-2002 | Command line arguments rearranged |
Output to file added, correct RISC OS filetype set | |
0.27 22-Feb-2002 | Fixed warnings not being output to file |
0.28 01-May-2002 | Last word of code storage cleared before loading |
SDT2 type (LDR SB/SH/H) instructions added | |
MUL and SDS invalid combinations rejected | |
Removed 26bit wrapping from PC relative address calcs | |
0.29 02-Jun-2002 | 32bit SCL jump table unknown entries analysed correctly |
0.30 06-Jul-2002 | WIMP structure priorities altered |
Prefix added to EmbleWimpIconData | |
0.31 26-Jul-2002 | Patch candidate statistics added |
Sub stats now percentage of main stat not total words | |
0.32 01-Sep-2002 | Asm warning links generated if not disassembling |
Mutex flag sequence end correctly detected with R14 DP's | |
SDT target address calculation fixed by adding offset | |
FindAddr and FindValue terminates on BL or LDM Rx | |
Code start sequence finder recognises if PC is stacked | |
Code and data stack tracing added | |
Trailing .0 removed from FP constants for ObjAsm | |
FP precision added to FIX for ObjAsm | |
Title and warning lines prefixed with ';' in text output | |
Branch table endpoint label removal problem fixed | |
Mid string assembler label offsets generated | |
References generated from MarkData address | |
MakeString single and double quote bugs fixed | |
$$ quoted in assembler strings | |
Greater precision used for DCFS and DCFD values | |
Character following escape ignored in string | |
Only BASIC or unknown string types terminate on \r | |
Address word identification prevented in SCL modules | |
Relocation table tragets and end points marked as ref'd | |
Detection of sequential strings improved | |
0.33 05-Sep-2002 | FindAddr FindValue conditional instruction checking |
0.34 21-Sep-2002 | Explicit Immediate value & rotate used in ASM instructions |
Non 32bit compliant SWI warningd added | |
Label prescan to ensure only valid asm labels are generated | |
0.35 04-Nov-2002 | CPSR and SPSR variants made compatible with ObjAsm |
mnemoaic_opts added for control of immediate and ADRs | |
Suprious OS_WriteI SWI variants removed from OSLib port | |
Invalid FP instructs with prec=3 and FIX const rejected | |
All code using NV condition marked as not 32 bit | |
Settype comment added to aseembler output for modules | |
Library directory chunk date stamp displayed correctly | |
String detection prevented from running off end of data | |
Error block detection fixed | |
Sound SWI groups &40140-&4amp;0180-&401C0 dispatch corrected | |
Sound_Install voice installation header identified | |
Wimp Message block size validated by analysis | |
32bit module header analysis and display added | |
C symbols searched in reverse order for code detection | |
Non XScale instruction warning added for LDR Rd,[Rd],#0 | |
Warning added for branching outside 64MB or with thumb bit | |
Non RISC OS: Bogus names for SWI &100 to &1FF removed | |
Non RISC OS: Sound SWI names added &40140-&40180-&401C0 | |
Non RISC OS: osfile funcs dont convert extn to lower case | |
Non RISC OS: Territory_ConvertDateAndTime fixed | |
!ARMalyser auto runs 26 or 32 bit RISC OS executable | |
0.37 28-Nov-2002 | Code cautionary descriptions improved |
0.38 29-Nov-2002 | Display of new cautions from state and decoding |
0.39 05-Dec-2002 | Analysis state stacking added |
0.40 06-Dec-2002 | 26/32bit guarding and conditional exit detection enhanced |
0.41 08-Dec-2002 | AOF and ALF with filetype text as well as data allowed |
Relocated addresses handled in register memory loads | |
Fix to label generation in during analysis warnings | |
CRT on control processor decode type corrected | |
0.42 22-Dec-2002 | 16 bit half word data type handling added |
ARMv5 load/store double word instrucions added | |
Wimp Icon Data buffer and validation marked as address | |
Long ADR recognised and real label with offset used | |
Label+offset links fixed in HTML output | |
OBJAREA_START marker placement validity check | |
0.43 31-Jan-2003 | SDT Rm=Rn with write back is unpredictable |
MarkData exits if length is > 16MB (or signed negative) | |
Caution if comparing PC against value of 1<<31 or more | |
ARMv5 Misc and DSP instructions added | |
ARMv5 instruction display conditional on 32bit executable | |
MessageTrans menu structure display corrected | |
RO4 fast service entry show as address in service table | |
Instructure architecture version displayed | |
Shared C Library APCS-A recognised and marked as non 32bit | |
Co-pro data transfer instruction data marking corrected | |
ResourceFS file length displayed | |
Performance (XScale register latency) information added | |
LDR/STR register emulation offset sign calc corrected | |
Target processor command line option added | |
0.44 31-Jan-2003 | Executable extended to encompase SCL static data areas |
Shared C Library _swi and _swix vectors correctly labelled | |
Instruction issue latency added | |
Regsiter result latencies adjusted | |
Instruction shift and register shift added to decode info | |
AIF header dectection tightened | |
Register emu ignores loads from memory modified at runtime | |
Data not marked on unaligned loads and stores | |
Control terminated strings allowed in error blocks and various SWI parameters | |
ALIGN in assembler output replaced by DCB's of spare data | |
SpriteArea display fixed and enhanced | |
R14 address checked on SDT dynamic branch | |
NOP/No Banked caution not triggered on exclusive condition | |
End of file label added to assembler output | |
Register condition validity sep'd from value known flag | |
Instruction unpredictable info calculated in main decode | |
MSR immediate decoding and display fixed | |
AOF code detection and file structure display enhanced | |
Relative offset data type and display added | |
0.45 05-Apr-2003 | Dynamic branches with R14=PC+4 trated as subroutine |
Flag modifications tracked over dynamic branch subroutines | |
Instruction latency cycles added to register trace output | |
Total instruction latency cycles display option | |
Branch throughput latency corrected | |
OS_ReadLine32, OS_SubstituteArgs32, OS_HeapSort32 added | |
Mnemonic field length alterd in assem and dissassem modes | |
Brkpt marked as unpredictable if conditional | |
Error block marked on MSR CSPR_f,#Vflag | |
Width of SWI names in assembler EQU table mode increased | |
Detection of branch to thumb mode fixed | |
Removed banked register warning for PC | |
0.46 17-Aug-2003 | PSR state invalidated after flag setting DP instruction |
Z & C flag returning SWI knowledge added | |
Conditional after BL/SWI improved to use SWI BL flag info | |
Base field removed from discontiguous enum debug display | |
0.47 21-Mar-2004 | Last word of module service table not marked as array |
Object file code attribute explicitly flagged as 26bit | |
C flag set for OS_GBPB and OS_BGet | |
0.48 05-Jul-2004 | Check for sensible SCL chunk id added |
0.49 02-Mar-2005 | Module header end detection improved |
0.50 30-May-2005 | Test for cond not compatible with flags set by instr |
0.51 14-Nov-2005 | PLD not treated as dynamic branch even if not ARMv5 |
BL/BLX & CLZ decode subtype numbers corrected | |
0.52 02-Dec-2005 | Construct start markers added |
Debug data bitfield type omission corrected | |
Labels used for debug data pointers in assembler output | |
Assembler & Dissambler output string generation improved | |
ARMv5 decoding suppressed unless target XScale specified | |
Fixed Rm state caclulation for DP with register shift | |
Data backtrace location moved to pickup byte modifications | |
Dots substituted for characters 128 to 159 in HTML output | |
0.53 22-Dec-2005 | Debug FileInfo structure not skipped if length field zero |
Debug FileInfo formating for 3 digit increments | |
Undefined rd in COPRO_CTRL CDT fixed | |
Help text updated | |
0.54 02-Feb-2006 | Fixed reg latency reporting only when reg values known |
0.55 13-Apr-2006 | Performance analysis corrected and enhanced, ARM9 addded |
SWI number displayed for kernel_SWI and swi functions | |
Checking of command line arguments added | |
0.56 06-Dec-2006 | C assumed if apcs reg requested so symbols are identified |
Symbol detection allows short strings | |
Fixed Rm state caclulation for SDT2 instructions | |
Binary chop algorithm used to find preceeding lable | |
0.57 03-Jan-2007 | ELF file format support added |
Terminating function name list extended | |
Fixed BDT instruction base address for data marking | |
Label+Offset bracketed when used in relative expressions | |
Value comments improved for assembler mode | |
Array extension moved to surmise level 3 | |
Fast find nearest label routine used for pointer comments | |
Code and data warning for address zero supressed | |
Backtrace support for RO5 ROM location added (untested) | |
Memory access macros renamed | |
0.58 07-Jan-2007 | SDT value calculation fixed |
Jump table detection added for LDRxx PC,Rm | |
Jump and Branch table range detection improved | |
CalcRm returns invalid if Rm and Rs unknown | |
0.59 12-Jan-2007 | Separate addr & state stacks used for 64bit compatability |
Size data type added, used in AOF/ALF/AIF & debug info | |
32bit flag always set when target is ARM9 or XScale | |
Additional end of file word not counted in statistics | |
HTML output changed to 4.01 and CSS 1 | |
HTML chars 127 to 159 given orange colour tag | |
Fix to unspecified user formatting tag free failure | |
Program data array validation checks added | |
0.60 23-Mar-2007 | ELF Relative object file support added |
ELF Symbol table handling improved and REL support added | |
ELF REL Unliked functions looked up from relacation table | |
AR format ELF static libraries supported | |
AOF/ALF annotations improved | |
AOF function names picked up from the symbol table | |
AOF unlinked functions recognised from relocation table | |
Local func at start of AOF AREA not classed as unlinked | |
First word of ELF string table marked correctly | |
Text output chars 128 to 159 replaced with dots | |
MarkFunction respects contruct and code info precidence | |
BDT with PC on stack/frame not treated as dynamic branch | |
Fixed length string dissembler output corrected | |
0.61 27-Apr-2007 | C++ name unmangle code re-written |
-ls list symbols option added | |
Utility file header support added | |
0.62 16-Oct-2007 | G++ name unmangle code added |
CFront templates names supported | |
uncompressAIF used on non RISC OS platorms | |
0.63 24-Nov-2007 | Fix for 1 letter member funcs and qualified names in CF |
0.64 24-Jun-2012 | Minor fixes from C++/C# port |
0.65 03-May-2020 | Complete re-write of CFront and GCC name unmangling |
Analysis BL check start address fixed | |
0.66 08-Sep-2020 | Correctly handle untyped file load exec addresses |
Correct 5-byte time assembler and dissembler output | |
Remove duplicate semicolon for Norcroft in unmangle | |
CF invalid 't' template error suppressed | |
Handle untyped files in Load and Unsqueeze | |
Handle @@ postfix in GCC | |
Support for untyped files on non-native filing systems using ",<loadaddr>-<execaddr>" filename postfix | |
Support for C and C++ 64 bit builds added | |
Changes from cppcheck and gendarme linters | |
Fixed temporary file being left by unsqueeze on Linux | |
0.67 03-Oct-2021 | Fix marking SCL static data over code at end of C Module |
Runtime check for FPA double word ordering coversion |