WEITEK CORP 02 DE 9663826 0000442 / T.49-12-05 WTL 3132/WTL 3332 XL-3132 32-BIT FLOATING POINT DATA PATH ## PRELIMINARY DATA October 1987 The WEITEK WTL 3132, WTL 3332, and XL-3132 single-chip floating point data paths each offer a full instruction set, including multiply, multiply/accumulate, add, subtract, type conversion, and divide look-up operations. Efficient design and architecture, combined with CMOS technology, provide up to 25 MFLOPS of performance at very low power. Related products: XL-8136 32-bit Sequencer, XL-8137 32-bit Integer Processor | Contents | | |----------------------------------|-----------------------------| | Features | 1 | | Description | 1 | | Architecture | 2 | | Signal Description | 4 | | Block Diagram | $\frac{2}{4}$ $\frac{5}{7}$ | | Register File | 7 | | Multiplier/Accumulator | 8 | | Temporary Registers | 13 | | Internal Data Routing | 16 | | Input/Output | 20 | | System Interfacing | 25 | | Instruction Set | 30 | | Initialization | 35 | | Division | 38 | | Data Format | 39 | | IEEE Considerations | 41 | | DC Specifications | 42 | | Timing Diagrams | 43 | | AC Specifications | 45 | | Pin Configuration | 46 | | Physical Dimensions | 48 | | Appendix A: The XL-3132 in | | | the XL Environment | 49 | | Appendix B: Programming Examples | 55 | | Ordering Information | 58 | | Revision Summary | 59 | | Documentation Request Forr | n 61 | | Sales Offices back | cover | ## PRELIMINARY DATA October 1987 ### **Features** 32-BIT FLOATING POINT PROCESSOR Single-precision floating point multiplier/ALU Four-port 32×32 register file IEEE floating point format Low power, high integration CMOS **FULL FUNCTION** Add, subtract, multiply, multiply/accumulate Divide look-up table Type conversion to and from two's complement integer Three-address (rc := ra + rb) architecture Flexible I/O options #### HIGH PERFORMANCE 80, 100 and 120 ns cycle times Up to 25 MFLOPS throughput (1 MAC/cycle) ·Low latency (3-cycle register-to-register operation) High I/O bandwidth (up to 200 Mbytes/sec) #### MULTI-PURPOSE For maximum throughput, use the three-port WTL 3332 For maximum design flexibility, use the WTL 3132 or WTL 3332 as microprogrammable building blocks For high level language support, use the XL-3132 as the XL-8032 floating point coprocessor ## Description The WTL 3132 and WTL 3332 are single-precision floating point data paths. Each includes a pipelined multiplier/accumulator and a four-port register file with thirty-two 32-bit registers. The WTL 3132/3332 are suited to a wide range of systems that need high numeric processing performance. They may be adopted as the floating point unit for a general-purpose processor, used as building blocks for application-specific data paths or even connected together to create vector or array processors. The WTL 3132 has a single bi-directional 32-bit input/output port. It is designed to be used as a floating point coprocessor or accelerator. The WTL 3332 has three 32-bit ports; one bi-directional input/output port, one input port and one output port. It should be used in applications which require multiple high-bandwidth buses. The XL-3132 may also be used with the WEITEK XL-8136 program sequencing unit (PSU) and XL-8137 integer processing unit (IPU) to create a fast, general-purpose numeric processor, the XL-8032. Full development system support, including FORTRAN and C compilers, is available for the XL-Series of processors. The XL-3132 is functionally identical to the WTL 3132. Both devices are manufactured in low power CMOS and are available in standard pin grid array (PGA) packages. The WTL 3132 is supplied in a 144-pin PGA and the WTL 3332 in a 168-pin PGA. Figure 1. WTL 3132/3332 core functions ### Architecture #### MULTIPLIER/ACCUMULATOR The core of both the WTL 3132 and WTL 3332 is the multiplier/accumulator pipeline. Its first stage can multiply two operands together. The next stage can add or subtract another operand. Finally, the result is rounded and returned to a register and/or output port. Multiply, add, subtract, and multiply/accumulate operations are performed in the multiplier/accumulator. They all operate on data that conforms to the IEEE single-precision floating point format. Each operation takes three cycles, but, because the multiplier/accumulator is pipelined, a new operation can be started on every cycle. At any time, three independent operations may be at different stages in their execution. Rounding, conversion between floating point and two's complement integer formats, and other miscellaneous functions are supported in the accumulator. #### REGISTER FILE Operands and results of the multiplier/accumulator may be stored in the four-port register file. This file contains thirty-two registers, each of which may store a 32-bit value. The four ports allow the register file to supply two operands to the multiplier/accumulator, store its result back to a register, and perform an input/output transfer-all in the same cycle. #### INPUT/OUTPUT PORTS The external I/O ports are all 32 bits wide. They can each transfer a data value on every cycle. The WTL 3132 has one bi-directional external port; the X port. It can load and store data to and from the register file, and it can transfer data directly to and from the multiplier/accumulator. The WTL 3332 has three external ports; the X port, the Y port, and the Z port. The X port is the same as the WTL 3132's X port. The Y port feeds input operands directly to the multiplier/accumulator. The Z port outputs results directly from the multiplier/accumulator. These additional ports help to avoid the bottlenecks usually associated with I/O-intensive algorithms. Figure 2. WTL 3132/3332 I/O options ### PRELIMINARY DATA October 1987 DATA PATH ## Architecture, continued #### TEMPORARY REGISTERS Three 32-bit temporary registers are provided to store intermediate results. They make it possible to perform operations of the form $x = x \pm (y \times z)$ in a single cycle. ### DIVIDE LOOK-UP TABLE Support for divide operations is provided by an on-chip look-up table. It returns an approximation for the inverse of a value which may then be refined by iterative multiply/accumulate operations. Division is accomplished by multiplying the dividend by the inverse of the divisor. This complete divide operation takes eighteen cycles; other operations may be interleaved without a performance penalty. ### **INSTRUCTIONS** An instruction is latched into the code port on every cycle. It specifies operand sources, a result destination and all of the steps that will create this result during the next three cycles. Condition codes and exceptions may be generated by each operation as the result is written back to the register file. Four five-bit fields provide addresses for the register file. They each select a source or destination for one of the register ports. The three-bit function field specifies the type of multiplier/accumulator operation. The two-bit I/O control field directs data transfer at the external X port. Other fields select the route taken by the data during the the operation. A Mode Register controls data routing options that rarely change, selecting between a number of I/O tim- ing options, and supporting upward-compatibility with previous versions of the WTL 3132/3332. #### XL-SERIES COMPATIBILITY The XL-3132 may be used with the WEITEK XL-8136 program sequencing unit (PSU) and XL-8137 integer processing unit (IPU) to create the XL-8032 processor. The XL-3132 floating point unit (FPU) shares a 64-bit instruction word with the IPU and PSU. The IPU and FPU also share the 32-bit-wide data bus. The XL-3132 responds to the NEUT-, STALL- and ABORT- signals used to control branching and "wait states" within the XL-8032. It communicates its status to the PSU with the floating point condition (FPCN) and exception (FPEX) lines. See Appendix A for more about the XL-Series. Figure 3. XL-8032 block diagram ## Signal Description ### X PORT The 32-bit X31..0 port is a bi-directional data bus. Input data is sampled on the rising edge of CLK (or, if Double-Pump Mode is enabled, both on the rising and falling edges of CLK). Data transfers are controlled by the IOCt1..0 field in the instruction word. The X port may be set to a high impedance state by the OEX- signal. Active high. ### Y PORT The 32-bit Y<sub>31..0</sub> port is a data input bus. Input data is sampled on the rising edge of CLK (or, if the Y port Late Input Mode is enabled, on the falling edge of CLK). The Y port is only available on the WTL 3332. Active high. #### Z PORT The 32-bit Z<sub>31..0</sub> port is a data output bus. The output data is modified on every cycle. The Z port is only available on the WTL 3332. It may be set to a high impedance state by the OEZ- signal. Active high. #### C PORT The 35-bit C34..0 port is used as a code input bus. Instructions are latched the rising edge of CLK. Active high. C34 is only available on the WTL 3332 (see figure 40). ## OEX- X port output enable input. OEX- asynchronously disables the X port when high. Active low. #### OEZ- Z port output enable input. OEZ- asynchronously disables the Z port when high. Active low. ### FPEX(-) Floating point exception output. FPEX signals the occurrence of an enabled exception (overflow). Polarity is selectable by mode bit Ms. ### **FPCN** Floating point condition output. FPCN signals the occurrence of a condition as specified in the Encn<sub>1..0</sub> field of an instruction. Active high. #### **ZERO** Zero condition output. Indicates that the result of an operation is exactly equal to zero. Controlled by the Enon...o field of an instruction. Active high. #### NEUT- Neutralize input. Cancels the effect of the current instruction. Typically used during delayed branches and interrupt response routines (see page 27). Latched on the cycle following the instruction to be cancelled. Active low. #### STALL- Stall input. Cancels the effect of the next instruction. Typically used as a "not ready" line from the code store (see page 28). Latched on the same cycle as the potentially invalid instruction. Active low. ### ABORT- Abort input. Cancels the effect of both the current and next instructions. Typically used as a "not ready" line from the data store (see page 29). Latched on the same cycle as the next instruction. Active low. #### CLK Clock input. TTL compatible. ### **VDD** All VDD pins must be connected to 5.0V. ### GND All GND pins must be connected to system ground. Note: Signals denoted by "-" are active low. ## PRELIMINARY DATA October 1987 ## Block Diagram Figure 4. WTL 3132 block diagram Figure 5. WTL 3332 block diagram ## PRELIMINARY DATA October 1987 ## Register File The WTL 3132 and WTL 3332 each have thirty-two 32-bit general-purpose registers. Each register can store either a single-precision IEEE value or a two's complement integer value. ### **PORTS** The register file has four ports, A, B, C and D. The A and B ports are read-only, the C port is write-only, and the D port is bi-directional. Each port can transfer a 32-bit data word on every clock cycle. The A and B ports may be used to supply operands to the multiplier/accumulator and the divide look-up table. The C port receives the result of a previous operation. The D port communicates data between the register file and the external X port. This organization allows I/O transfers to proceed in parallel with calculation, maximizing system performance. #### REGISTER SELECTION The registers that are to take part in each transfer are selected by the instruction word. The instruction format allows a register address to be supplied for each port. They are provided in the Aadd, Badd, Cadd and Dadd fields of the instruction. These fields are five bits in length, allowing each address to specify any of the thirty-two registers. An instruction supplies the Aadd, Badd, and Dadd addresses to the register file during its first cycle and the Cadd address during its fourth cycle. This way a single instruction specifies all of the stages of an operation from initial source to ultimate destination. It is possible for a register to be selected by more than one field in the same cycle, in which case the following rules apply: - 1. If only read operations are to be performed on the register in question, then its value is copied to all of the necessary ports. - 2. If two ports (C and D) attempt to write into the same register on the same cycle, the contents of the register will be left in an undefined state. Such contention should be avoided. - 3. If a register is to be both read and written on the same cycle, its old value will be read before it is updated to the new value, unless one of the Bypass Modes is activated (see page 16). Figure 6. The four-port register file #### READ/WRITE CONTROL Two other fields in the instruction word affect the operation of the register file. The Cwen- bit controls writing of results into the C port. When it is active (low), the result data is written on the fourth cycle of the operation. When writes are disabled, the contents of the register specified by Cadd remain unchanged. Register writes may be disabled either to direct a result to a Temporary Register or to allow arithmetic comparisons to modify the Status and Condition Registers without overwriting the contents of a general-purpose register. 2. The IOCt1..0 bits control the direction of D port transfers (see page 20 for details). If the C port and the D port attempt to write to the same register file location on the same cycle the register contents are left undefined. # Multiplier/Accumulator The WTL 3132 and WTL 3332 each have a pipelined multiplier/accumulator. These consist of a floating point multiplier whose output is fed into a floating point ALU (Arithmetic and Logic Unit). All multiplier/accu- mulator input and output ports can transfer 32-bit data values. Figures 7 and 8 show how operations are pipelined through the multiplier/accumulator. Figure 7. Simple example of multiplier/accumulator timing. ## PRELIMINARY DATA October 1987 ## Multiplier/Accumulator, continued Figure 8. Timing of MAC operations ### **MULTIPLIER** The multiplier has two input ports, MAin and MBin. It has one output which can only be connected to the AAin port of the ALU. In the first cycle of an operation, operands are transferred from the register file and fed into MAin and MBin. The multiplication is completed on the second cycle. The intermediate result may be negated before it is passed to the ALU. #### ALU The ALU has two input ports, AAin and ABin. It has one output which is normally connected to the C bus. AAin may be connected to the multiplier's output, so that its result is fed into the ALU. Another operand is fed into the ABin port simultaneously. The ALU completes the function specified by an instruction during its third cycle. The final result is rounded and output to the C bus to be returned to the register file on the fourth cycle. ### LATENCY Because the multiplier/accumulator is pipelined, an operation can be initiated every cycle. The result of an operation is generated three cycles after it is initiated. On the fourth cycle, the result can be returned to the register file or fed straight back into the multiplier/accumulator using the temporary registers or a bypass mode (see pages 13 or 17). # Multiplier/Accumulator, continued ### **FUNCTION SELECTION** The multiplier/accumulator function is specified by the 3-bit field F2..0 in the instruction word as outlined in the function select table (figure 9). A single instruction specifies all of the actions associated with one operation as it passes through the multiplier/accumulator. | F2 F1 F0 | MNEMONIC | OPERATION . | DESCRIPTION | |----------|----------|-------------------------------|-----------------------| | 0 0 0 | _ | Miscellaneous | See figure 10 | | 0 0 1 | fsubr | Negate and add | -AAin + ABin | | 0 1 0 | fsub | Subtract | AAin – ABin | | 0 1 1 | fadd | Add | AAin + ABin | | 1 0 0 | - | Reserved | | | 1 0 1 | fmna | Multiply, negate and add | −(MAin × MBin) + ABin | | 1 1 0 | fmns | Multiply, negate and subtract | -(MAin × MBin) - ABin | | 1 1 1 | fmac | Multiply and accumulate | (MAin × MBin) + ABin | Figure 9. Function select field encoding When the F2..0 field is (0, 0, 0) the operation to be performed is specified by the Badd field according to figure 10. | Badd4-0 | MNEMONIC | OPERATION | DESCRIPTION | |-------------|----------|-----------------------|----------------| | 00000 | fclsr | Clear Status Register | | | 00001 | fstsr* | Read Status Register | | | 00010 | _ | Reserved | | | 00011 | fmode | Load Mode Register | | | 00100 | fabs | Absolute Value | AAin | | 00101 | float | Fixed-to-Float | integer → IEEE | | 00110 | fix | Float-to-Fixed | IEEE → integer | | 00111 | flut | Look-up Operation | | | 01000-11111 | _ | Reserved | | <sup>\*</sup> fstsr instructions must have their IOCt1..0 field set to select a store. Figure 10. Miscellaneous function select encoding. ## PRELIMINARY DATA October 1987 # Multiplier/Accumulator, continued #### MULTIPLY/ACCUMULATE FUNCTIONS The WTL 3132/3332 provide three multiply and accumulate functions. fmac multiplies MAin and MBin and then adds ABin. fmns multiplies MAin and MBin, negates the result and then subtracts ABin. fmna multiplies MAin and MBin, negates the result and then adds ABin. These functions are triadic (they have three input operands). If an IEEE multiply operation is required, the constant 0.0 should be selected as the ABin input. ### **ALU FUNCTIONS** The WTL 3132/3332 provide three diadic "ALU only" functions, fadd adds AAin to ABin, fsub subtracts ABin from AAin, fsubr subtracts AAin from ABin, The F2..0 field determines whether the multiplier is bypassed and the ALU's input staged directly into the ALU (see figure 11). These functions operate in the same number of cycles as the multiply and accumulate functions. This simplifies the programmer's model; every operation has the same latency. Figure 11. "ALU only" operations ## Multiplier/Accumulator, continued #### MISCELLANEOUS FUNCTIONS If the function field is equal to zero, then a miscellaneous ALU function will be selected according to the contents of the instruction's Badd field (see figure 10). - 1. flut is monadic (that is, it has a single input operand). It takes the value on the A bus as its operand and it returns an approximation to the inverse of this value onto the C bus on its fourth cycle. flut does not attempt to modify the Status, Condition or Zero Registers. It is recommended that the Abina...o field be set to select the constant 0.0. (See page 38.) - 2. fix is a monadic "ALU only" function. It takes a single-precision IEEE format floating point value on the A bus as its operand and returns a 24-bit, sign extended, two's complement integer onto the C bus on its fourth cycle. fix does not attempt to modify the Status or Zero Registers. It is the only instruction that produces an integer result. It is recommended that the Abina..o field be set to select the constant 0.0. (See page 40.) - 3. float is a monadic "ALU only" function. It takes a 24-bit, sign extended, two's complement value on the A bus as its operand and returns a single-precision IEEE format floating point number onto the C bus on its fourth cycle. float does not attempt to modify the Status or Zero Registers. It is the only instruction that requires an integer operand. It is recommended that the Abin2...o field be set to select the constant 0.0. (See page 40.) - 4. fabs is a monadic "ALU only" function. It takes the value on the A bus as its operand and returns its absolute value onto the C bus on its fourth cycle. fabs does not attempt to modify the Status Register. The Zero and Condition Registers are modified according to the result unless Encn1..0 = (0,0). As with the diadic "ALU only" functions, it will clamp denormalized operands to zero and NaNs to infinity. The Abin2..0 field must be set to select the constant 0.0. - 5. fmode loads the desired operating modes into the Mode Register (see page 35). Because this operation changes the timing of many operations, the results of the next three operations should be discarded. - 6. fstsr copies the contents of the Status Register to the X port. It has the same timing as the other fstore operations. (See page 26.) It is recommended that the Cwen- bit be set to prevent register writes, the Encn1..0 field to disable updates of the FPCN pin, the IOCt1..0 field to an fstore and that the result sent to the register file on its fourth cycle be discarded. fstsr ignores the register address fields. - 7. fclsr clears the contents of the Status Register to zero. (See page 26.) It is recommended that the Cwen- bit be set to prevent register writes, the Enon1..0 field to disable updates of the FPCN pin, the IOC11..0 to an I/O nop and that the result sent to the register file on its fourth cycle be discarded. fclsr ignores the register address fields. #### MULTIPLIER/ACCUMULATOR NOP The WTL 3132/3332 do not have a dedicated nop instruction. WEITEK software tools use fsub .f0, .f0, .f0 with the Cwen- bit set to disable register writes and the Encn...o field cleared to disable FPCN updates. This choice of nop causes no state changes. ## PRELIMINARY DATA October 1987 ## Temporary Registers The WTL 3132 and WTL 3332 each include three 32-bit temporary registers (Tregs). They allow values to be recirculated to the ALU without passing through the general-purpose register file. The Tregs are often used as accumulators during successive multiply/accumulate operations. They make it possible to perform a calculation of the form $x = x \pm (y \times z)$ every cycle. Figures 12 and 13 show how the Tregs are used to feedback operands to the multiplier/accumulator: figure 15 gives an example of a code sequence that does this. Figure 12. Use of temporary registers # Temporary Registers, continued Figure 13. Temporary register timing ### WRITING TEMPORARY REGISTERS The instruction word contains a two-bit field, Adst1..0, that determines the destination of the ALU output (see figure 14). The output of the ALU is always sent to the C bus. If no Treg is selected by the Adst1..0 field, then the result is returned only to the register selected by this instruction's Cadd field on its fourth cycle. | Adst1-0 | RESULT DESTINATION | | |---------|--------------------|--| | 00 | Treg3, C bus | | | 01 | Treg2, C bus | | | 10 | Treg1, C bus | | | 11 | C bus | | Figure 14. ALU destination select field encoding If the Adst1..0 field selects a Treg in addition to the C bus, it is loaded with the result on the fourth cycle of an operation, just as the Cadd register write occurs. On the next cycle, the contents of the Treg may be input directly to the ABin port. This is illustrated by the example shown in figure 15. The Cwen- bit of the instruction that writes to the Treg may be held high to prevent the write to the Cadd register. This increases the number of available generalpurpose registers. ## PRELIMINARY DATA October 1987 ## Temporary Registers, continued #### READING TEMPORARY REGISTERS The instruction word contains a three-bit field, Abin2..0, that determines the source of the MAC's ABin input. Three encodings select one of the Tregs (see figure 17). If a Treg is selected, it is copied to the ABin port on the second cycle of the operation. The Treg may be be read on the cycle after it was written. In figure 15, for example, .t1 gets read during the second cycle of op #4. This is the fifth cycle of op #1. ``` op #1 fmac .f0. .f1. 0, .t1 .f3, .f4, op #2 fmac .t1. .f5 -old .t1 op #3 fmac .f6, .f7, .t1, .f8 -Illegal op #4 fmac .f9, .f10, .t1, .f11 -new .t1 ``` #### Notes: A Treg cannot be both written and read in the same cycle. Op #3 must never attempt to read the Treg written by op To make this code interruptable, op #2 and op #3 should not specify .t1 as an operand. Full details of this syntax are given on page 32. Figure 15. Use of temporary registers # Internal Data Routing ### INTERNAL BUSES The three main internal buses, A, B and C, move data between the major functional units of the WTL 3132 and WTL 3332. Each of these buses is 32 bits wide and can carry one word per cycle. The A bus usually carries operands from the register file to the multiplier/accumulator or the divide look-up table. It may also be fed with operands directly from the X port if the Input Bypass Mode is enabled (see page 21). The B bus usually carries operands from the register file to the multiplier/accumulator. It may also be fed directly with operands from the Y port on the WTL 3332 if selected by the Mbs- bit (see page 23). The C bus usually carries multiplier/accumulator or flut results back to the general-purpose register file. If the Output Bypass Mode is enabled, then it may be used to feed the results directly to the X port. If the C bus is not needed to carry the results back to the registers (eg., when they are output directly to the Z port), then its usual direction of transfer may be reversed. It is then used to carry inputs from the X port directly to the multiplier/accumulator. This is only possible by the use of the floadro operation or the Double-Pump Mode (see page 21). | Mbin- | INPUT | |-------|----------------| | 0 | B bus<br>C bus | Figure 16. Multiplier input port select field encoding | Abin2-0 | INPUT | |---------|----------| | 000 | C bus | | 001 | B bus | | 010 | Treg2 | | 011 | Treg1 | | 100 | Treg3 | | 101 | Reserved | | 110 | 2.0 | | 111 | 0.0 | Figure 17. ALU input select field encoding # MULTIPLIER/ACCUMULATOR INPUT PORTS The multiplier/accumulator has four input ports, MAin, MBin, AAin and ABin. These ports can each receive a 32-bit word per cycle. They all have multiplexers which may be connected to various input sources. The possible selections for each port are described below: - 1. MAin usually obtains input from the A bus. Results may be fed from the C bus to the A bus using the Internal Bypass Mode and then into MAin. - 2. MBin usually obtains its input from the B bus. Results may be fed from the C bus to the B bus using the Internal Bypass Mode and then into MBin. Y port inputs may be enabled onto the B bus and then into the MBin. Alternatively, the C bus can be reversed so that it is carrying inputs from the X port to the MBin port. This prevents the C bus from being used to return results to the register file and is done in conjunction with the floadro operation or Double-Pump Mode. MBin must select the C bus (see figure 16). - 3. AAin usually obtains its input from the A bus. Results may be fed from the C bus to the A bus using the Internal Bypass Mode and then into AAin. - 4. ABin usually obtains its input from the B bus. Results may be fed from the C bus to the B bus using the Internal Bypass Mode and then into ABin. Y port inputs may be enabled onto the B bus and then into the ABin. The 3-bit Abinz..o field in the instruction word selects between input from the B bus (as above), input of the constants 0.0 or 2.0, input from one of the Tregs, or input from the C bus (as below) according to figure 17. Alternatively, the C bus can be reversed so that it is carrying inputs from the X port to the MBin port. This prevents the C bus from being used to return results to the register file and is done in conjunction with either the floadro operation or Double-Pump Mode. ABin must select the C bus. When this pathway is used, the ABin data is not delayed: an external register may be needed to synchronize the X and Y inputs. ## PRELIMINARY DATA October 1987 ## Internal Data Routing, continued If the function code specifies an operation that uses the multiplier, it directs data to the MAin or MBin ports. If the multiplier is not used, (that is, in "ALU only" operations), then the data is sent to the AAin or ABin ports automatically. The WTL 3132/3332 are designed to maintain a consistent latency regardless of the type of operation. ### INTERNAL BYPASS MODE Figure 18. Internal bypass routes ## Internal Data Routing, continued A code sequence may often specify the result of one operation to be the operand of a subsequent operation. The WTL 3132/3332 provide several methods of achieving this: ### 1. Internal Bypass Mode disabled The default mode (M0 = 0 and M11 = 0) of operation is to send the result back to one of the generalpurpose registers and then read the new value of this register as the operand. Figure 19 shows this code sequence: op #1 specifies .f3 as its destination and op #2-5 all specify .f3 as one of their operands. Because the result of op #1 gets written back to .f3 on its fourth cycle, op #5 is the first of the succeeding operations to read the new contents of .f3 as desired. This represents a registerto-register latency of four cycles. ``` .f1, op #1 fadd .f0, .f2 op #2 fadd .f2, .f3, .f4 -old .f2 .f6 op #3 fadd .f2. .f5, —old .f2 op #4 fadd .f2, .f7, .f8 —old .f2 op #5 fadd .f2, .f9, .f10 -new.f2 Notes: To make this code interruptable, op #2, op #3 and op #4 should not specify .f2 as an operand. ``` Figure 19. No internal bypassing Full details of this syntax are given on page 32. #### 2. Internal Bypass Mode enabled If the Internal Bypass Mode is enabled (M0 = 1) and M11 = 1) then the register-to-register latency is reduced to just three cycles. Figure 18 shows the two internal bypass multiplexers that allow results on the C bus to be copied over to the A or B buses without first being returned to the register file. Figure 20 shows a code sequence that uses the bypass mode: op #1 specifies .f2 as its destination and op #2-4 all specify .f2 as one of their operands. In contrast to the previous example, op #4 is fed the result of op #1 on the same cycle that the result is written to the register file. The multiplexers operate by comparing the Aadd and Badd address fields to the Cadd address field as they are presented to the register file on each cycle. If Aadd = Cadd, then the value on the C bus is copied to the A bus; and if Badd = Cadd, then the value on the C bus is copied to the B bus. Enough time remains for that value to be latched into a multiplier/ accumulator input port before the end of the cycle. In the example, the Cadd field of op #1 matches the Aadd field of op #4 as they are compared on the fourth cycle of op #1, the bypass from C to A buses is opened and op #4 can proceed immediately with the new data. During the same cycle, the result is copied into the Cadd register as usual so that the register file remains consistent with the data values in Setting mode bit M0 = 1 enables the C-to-A bus bypass and setting mode bit M11 = 1 enables the Cto-B bus bypass. If the Cwen- bit of an instruction is set to prevent register writes, then the Internal Bypass Mode is temporarily suspended on the fourth cycle of that operation; this insures that the register file contents are kept in step with the operands used by each instruction. Similarly, the NEUT-, STALL-, and ABORT- signals cause the Internal Bypass Mode to be suspended as they cancel the register write of an instruction. ``` op #1 fadd .f0, .f2 .f1, op #2 fadd .f2. .f3, .f4 -old .f2 op #3 fadd .f2. .f5, .f6 -old .f2 op #4 fadd .f2, .f7, .f8 -new .f2 Notes: To make this code interruptable, op #2 and op #3 should not specify .f2 as an operand. Full details of this syntax are given on page 32. ``` Figure 20. Use of internal bypassing ## PRELIMINARY DATA October 1987 # Internal Data Routing, continued Figure 21. The timing of Internal Bypass Mode (M0 = 1 and M11 = 1) ## 3. Temporary Registers To complete the range of alternative routes available for feeding results back to the multiplier/accumulator, the Treg code example first given in figure 15 is repeated here. The temporary register option has the same timing as the Internal Bypass Mode, but only feeds back to the ABin port. The Tregs bring an extra source of operands to the multiplier/accumulator, allowing operations of the form $x = x \pm (y \times z)$ to be executed in a single cycle. Many of the code examples given read a register after the instruction that modifies it has been initiated. While such code sequences are valid, they are uninterruptable. More detailed coverage of interruptable code may be found on page 33. ``` .t1 .f0, .f1, 0, op #1 fmac .f3, .t1, .f4 -old .t1 .f2, op #2 fmac --Illegal .f7 op #3 fmac .f5, .f6, .t1, .f9, .f8. .t1, .f10 -new .t1 op #4 fmac ``` ### Notes: A Treg cannot be both written and read in the same cycle. Op #3 must never attempt to read the Treg written by op #1, To make this code interruptable, op #2 and op #3 should not specify ,t1 as an operand. Full details of this syntax are given on page 32. Figure 22. Use of temporary registers ## Input/output The WTL 3132 and WTL 3332 provide different input/output facilities. The WTL 3332 has additional Y and Z ports to increase the bandwidth between the multiplier/accumulator and external components. Sections specific to the WTL 3332 will state this in their headings. All data I/O ports are 32 bits wide. They can all transfer at least one word on each cycle. The memory-to-memory latency can be as low as five cycles (two more than the register-to-register latency). All output buses may be disabled by de-asserting their asynchronous output enable signals. THE X PORT: NORMAL USAGE (WTL 3132 AND WTL 3332) Figure 23. Normal X port I/O timing The X port normally transfers data to and from the register file D port. The Dadd field selects the register in question and the IOCt1..0 field in the instruction word controls the transaction. These I/O transfers always begin during the first cycle of an operation. See figure 24 for the IOCt1..0 encoding scheme and figure 23 for normal I/O timing. - 1. The fload operation loads the value at the X port pins into the register selected by Dadd. It is completed by the end of the first cycle. If the register is read on the same cycle, its previous contents will be output. - 2. The fstore operation stores the contents of the register selected by Dadd to the X port output register (XDoutR) during the first cycle. This value is driven onto the X port pins on the second cycle. The fstore operation drives the X pads during most of its second cycle and at the start of its third cycle. Input data may not be applied to the pins until partway through its third cycle. The OEX- pin can asynchronously disable the output at any time. - 3. The floadro operation is described on page 22. 4. The I/O nop operation simply disables the X port and ignores any input. It does not prevent the multiplier/accumulator from writing to registers or modifying the state of the condition and exception outputs. Note: An fload should not follow a fstore immediately. At least two I/O nop cycles must be inserted between them if the Coprocessor Load Mode is disabled, and at least one I/O nop if it is enabled (see page 27). | IOCt1-0 | OPERATION | | |---------|-----------|--| | 00 | I/O nop | | | 01 · | floadro | | | 10 | fstore | | | 11 | fload | | Figure 24. X port I/O control field encoding ## PRELIMINARY DATA October 1987 ## Input/output, continued Figure 25. Input/Output Bypass diagram (X port) ## INPUT BYPASS MODE (WTL 3132 AND WTL 3332) During a normal fload operation the X port data must be written to a register one cycle before it can be used as the operand of a subsequent operation. If, however, the Input Bypass Mode is enabled and Aadd = Dadd; the value at the X port is loaded into both the register selected by Dadd and onto the A bus during the first The Input Bypass Mode is enabled when bit M3 of the Mode Register is set to 1 (see page 35). ## Input/output, continued Figure 26. X Port I/O timing (Input and Output Bypass Modes enabled; M3 = 1 and M4 = 1) The MAin port always receives the input data by the end of the first cycle. If the function is "ALU only", the data is staged into the AAin port during the second cycle. The Input Bypass Mode should not be enabled when the Coprocessor Load Mode is enabled. ## FLOADRC OPERATION (WTL 3132 AND WTL 3332) During a normal fload operation the X port data is copied to the D port, and, if the Input Bypass Mode is activated, the A bus. If the floadro operation is used instead, the value at the X port is loaded both into the register selected by Dadd and onto the C bus during the first cycle, where it can be selected by one of the multiplier/accumulator input ports, MBin or ABin. floadre prevents the multiplier/accumulator from writing the result of a previous instruction to the Cadd register, and the resulting contents of this register are undefined. Results from the divide look-up table preempt the floadro operation and are written to the Cadd register as usual; floadrc still copies the X port value to the Dadd register. The Cwen- control can be set to prevent unwanted writes to the register specified by the Cadd field of an instruction. If the MBin multiplexer selects the C bus it will receive the X port data by the end of the first cycle. If the ABin multiplexer selects the C bus it will receive the X port data by the end of the first cycle. Note that the two ports are skewed by one cycle from the programmer's view because the ABin input is not delayed. floadro should not be used in an interruptable environment. ## **OUTPUT BYPASS MODE** (WTL 3132 AND WTL 3332) During a normal fstore operation the multiplier/accumulator result must be written to a register one cycle before it can be output to the X port. If, however, the Output Bypass Mode is enabled and Cadd = Dadd; the multiplier/accumulator result is sent to both the register selected by Cadd and the X port output register (XCoutR) on the same cycle. The Output Bypass Mode is enabled when bit M4 of the Mode Register is set (see page 35). The fstore instruction that specifies the Dadd must start execution on the fourth cycle of the arithmetic instruction that specified the Cadd. The output appears at the X port pins during the second cycle of the fstore instruction. (See figure 26.) If the Cwen- bit of an instruction is set to prevent register writes, then the Output Bypass Mode is temporarily suspended on the fourth cycle of that operation. This insures that the register file contents are kept in step with the output data. ### PRELIMINARY DATA October 1987 ## Input/Output, continued THE Y PORT: NORMAL USAGE (WTL 3332 ONLY) The Y port is an input-only port. The Mbs- bit in the instruction word controls whether the B register file port or the Y input port drives the B bus (see figure 28). If the B port is selected, then the data input at the Y port is discarded. If the MBin port then selects the B bus as input, the multiplier will receive the Y port data by the end of the first cycle. If the ABin port selects the B bus as input, the ALU will receive the Y port data at the end of the next cycle. Two external data streams may be fed into the multiplier/accumulator through the X and Y ports without being written to the register file. The Y port data can be fed into the MBin or ABin inputs via the B bus. Meanwhile, the X port data can be fed to the MAin or AAin inputs via the A bus using the Input Bypass Mode. Alternatively, the X port data may be fed into the C bus with the floadro operation. Figure 27. Normal Y port I/O timing | Mbs- | B BUS INPUT SOURCE | |------|---------------------------------------------------| | 0 | Register file (B Port)<br>External input (Y Port) | Figure 28. Y port input select field encoding ## Y LATE INPUT MODE (WTL 3332 ONLY) The Y port data may be sampled on the falling edge of CLK instead of its rising edge. This mode is selected by setting bit M12 of the Mode Register to 1 (see page 35). It reduces the delay through the YinR register so that the data still arrives at the multiplier/accumulator ports before the end of the first cycle. The Y port Late Input Mode can be used to demultiplex a high-bandwidth input bus onto the X and Y ports. Figure 29. Y port Late Input Mode timing ### THE Z PORT: NORMAL USAGE (WTL 3332 ONLY) The Z port is an output-only port. It outputs the result of a multiply/accumulate operation directly (see figure 30). The result of a flut operation cannot be output to the Z port. The Z port output register (ZoutR) is loaded on the fourth cycle of an operation; just when the Cadd register should be written. This transfer occurs without regard to any other activity on the C bus. The Z port output register then drives this value onto the Z port pins. The output occurs during the fifth cycle of the instruction. The Z port output register is updated on every cycle, even if the multiplier/accumulator result is meaningless. The OEZ- pin can asynchronously float the output at any time. # Input/Output, continued Figure 30. Normal Z port output timing ### DOUBLE-PUMP MODE (WTL 3332 ONLY) The WTL 3332 multiplier/accumulator has four ports. It can be fed an operand through each of the MAin, MBin and ABin inputs and returns a result via the ALU output on every cycle. Only three external ports are provided, X, Y and Z. Double-Pump Mode allows two inputs to be made via the X port on each cycle; all four multiplier/accumulator ports may then be serviced by the external ports. Double-Pump Mode is enabled by setting bit M9 of the Mode Register to 1 (see page 35). If Double-Pump Mode is enabled, the IOCti..o field in the instruction word must constantly be set to fload. The first input is latched into the X port on the rising edge of CLK at the beginning of the cycle, as usual. It is written into the Dadd register by the end of the cycle. If the Input Bypass Mode (see above) is enabled and Aadd = Dadd, then this value is also driven onto the A bus and can be latched by the MAin port by the end of the first cycle or the AAin port by the end of the second cycle. The second input is latched into the X port on the next falling edge of CLK. It is driven onto the C bus and may be latched by the MBin port or the ABin port by the end of the first cycle. The MBin or ABin port may also receive input data from the Y port via the B bus allowing all three multiplier/accumulator inputs to be fed simultaneously. The multiplier/accumulator result should be directed to the Z port. Figure 31. Double-Pump mode timing ## PRELIMINARY DATA October 1987 ## System Interfacing Certain signals on the WTL 3132 and WTL 3332 are provided to communicate control information to and from the other parts of a system. Two outputs, FPCN and ZERO, indicate the condition of an operation. They can be sent to a sequencer to control instruction branching. One output, FPEX, signals the occurrence of arithmetic overflow. It can be used to interrupt a host processor to request corrective action. Three inputs, NEUT-, STALL- and ABORT-, allow the effects of instructions fed into the C port to be canceled. They can be used to make the WTL 3132/3332 respond correctly to page faults, interrupts or other system requests. ### CONDITION AND ZERO The WTL 3132/3332 have a Condition Register and a Zero Register. The multiplier/accumulator attempts to modify the contents of these registers on every cycle. The instruction word includes a two-bit condition select field, Encni...o, which selectively allows the multiplier/ accumulator to succeed in updating the contents of these registers. If both bits are cleared, then the previous state of the registers remains unchanged. Most functions update the Condition Register according to the sign and magnitude of their result. Miscellaneous functions may set the register for other reasons (see figure 32). Encn1..0 determines the exact condition that will set the Condition Register for each instruction. This allows any of the common comparisons (>, $\geq$ , =, $\leq$ , <) to be made in one operation. Figure 33 gives the bit encod- If the result of an operation is exactly equal to zero and the Encni..o field is not (0,0); the Zero Register is set to 1. If the result is not zero and Enchino is not (0,0); the Zero Register is cleared to 0. If Encni... is (0,0); the contents of the Zero Register remain the same. The contents of the Zero and Condition Registers are copied to the ZERO and FPCN outputs respectively on the fourth cycle of the operation, just as the generalpurpose register file write occurs. Bypassing does not affect the timing of these signals. These outputs always drive a logic 0 or 1. | FUNCTION | SET CONDITION REGISTER | SET ZERO REGISTER | SET STATUS REGISTER | |----------|----------------------------------------------|-------------------|---------------------| | fmac | N ≤ 0, N < 0, N = 0 | N = 0 | exp (N) ≥ 255 | | fmns | $N \le 0, N < 0, N = 0$ | N = 0 | exp (N) ≥ 255 | | fmna | $N \le 0, N < 0, N = 0$ | N = 0 | $exp(N) \ge 255$ | | fadd | $N \le 0, N < 0, N = 0$ | N = 0 | $exp(N) \ge 255$ | | fsub | $N \le 0, N < 0, N = 0$ | N = 0 | exp (N) ≥ 255 | | fsubr | $N \le 0, N < 0, N = 0$ | N = 0 | exp (N) ≥ 255 | | flut | - | _ | - | | fabs | _ | ! | <del>-</del> | | fix | I M I >2 22 | | _ | | float | $M < (-2^{23}) \text{ or } M > (2^{23} - 1)$ | _ | - | | fnop | _ | _ | | N is the result of an operation; M is the operand. Note: fix and float only test for operand range excess when M1 = 1. Figure 32. Effect of functions on Condition, Zero, and Status Register ## System Interfacing, continued | Encn1 | Encn0 | SET ZERO REGISTER | SET CONDITION REGISTER | |-------|-------|-------------------|------------------------| | 0 | 0 | | _ | | 0 | 1 | N=0 | $N \leq 0$ | | 1 | 0 | N=0 | N<0 | | 1 | 1 | N=0 | N=0 | Note: Encn1 and Encn0 must be zero for fnop. Condition Register is set by fix or float when Enchi... o is (0,1) and the operand exceeds the permitted range. Figure 33. Encn1..0 encoding #### STATUS AND EXCEPTIONS The WTL 3132/3332 have a Status Register. If an operation produces a result that is too large to be represented in the IEEE single-precision floating point format the multiplier/accumulator attempts to set the Status Register to 1. The Mode Register includes an exception control bit, M5 (see page 35). If M5 is set to 1 and an overflow occurs, the Status Register is set to 1. If M5 is cleared to 0, the Status Register is cleared to 0. If M5 is subsequently set to re-enable overflows, the Status Register will contain 0. The contents of the Status Register are copied to the FPEX output on the fourth cycle of the operation, just as the register file write occurs. One bit in the Mode Register, M8, selects the polarity of FPEX. If it is set to 1, then FPEX is active high. If it is cleared to 0, then FPEX is active low. If M8 is cleared, the Status Register is "sticky"; once set it will remain so until an folsr operation is performed. Two miscellaneous functions, fstsr and fclsr, allow the Status Register to be read at the X port or for it to be cleared. They take effect during their first cycle. If the fstsr operation is performed, then the IOCti..o field must specify a 'store' and its timing is the same as a normal store operation (see figure 34). Only the leastsignificant bit of the Status Register is guaranteed; the other 31 bits should be masked off when it is read. If the fclsr operation is performed, it is complete by the end of its first cycle. Figure 34, fstsr timing Figure 34a. Condition/Status timing ## PRELIMINARY DATA October 1987 ## System Interfacing, continued ### COPROCESSOR LOAD MODE The Coprocessor Load Mode is provided to support systems which generate a data address at the beginning of a cycle and need to latch the data word into the WTL 3132/3332 later in the same cycle. If bit M6 of the Mode Register is set, then the data applied to the X port is not sampled until late in the first cycle. Time still remains to write the Dadd register before the end of this cycle. As usual, the next instruction can use this data value as one of its operands. If this mode is used, neither the Double-pump Mode nor the X port Input Bypass Modes may be enabled. The number of I/O nops that must be inserted between an fstore and a subsequent fload is reduced from two to one. The STALL- signal timing is modified internally if the Coprocessor Load Mode is enabled so that it still cancels the fload and fstore operations. This mode must be selected if the WTL 3132 is to be used in the XL environment (see Appendix A). Figure 35. Coprocessor Load Mode timing ### NEUT-, STALL- AND ABORT- These three inputs allow a system to modify the effect of certain instructions dynamically. ### 1. NEUT- Neutralize is used to prevent the execution of instructions in the shadow of a delayed branch operation or during an interrupt service cycle. In a system where the sequencer supports delayed branching, it will present the next instruction to the C port as it decides whether to take a branch. If the branch is taken, this instruction must be cancelled before it has any effect on the state of the system. Similarly, if an interrupt occurs, the instruction due to be executed can be cancelled in order to branch to an interrupt service routine. The cancelled instruction is resubmitted for execution on return from interrupt. Figure 36. NEUT- timing The neutralize signal cancels the effect of the current instruction. It prevents the result of this instruction from being written into the register file or temporary registers. It has no effect on fload or fstore operations. This signal is sampled on the rising edge of CLK after the current instruction was fed into the C port. WEITEK CORP 02 ### 2. STALL- STALL- is used to hold off execution until a valid code word is present when the code word is delayed (as in a code memory refresh cycle) or absent (as in a page fault). The next operation can be continually stalled until the correct instruction word is presented to the C port. The STALL- signal cancels the effect of the next instruction. It prevents the result of this instruction from being written into the register file or temporary registers. It also cancels fload and fstore operations. This signal is sampled at the same time as the next instruction is fed into the C port. If Coprocessor Load Mode is enabled, the timing of STALL- is modified internally to maintain its usual effect. The fmode and folsr instructions are also properly stalled. Figure 37. STALL- timing (including fload) Figure 38. STALL- timing (including fstore) PRELIMINARY DATA October 1987 ## System Interfacing, continued ### 3. ABORT- ABORT— is used to cancel the current and next instructions when the data word is delayed (as in a cache miss) or absent (as in a page fault). If the two cancelled instructions are subsequently resubmitted for execution the processor will behave as if no interrupt occurred. The ABORT- signal cancels the effect of both the current and next instructions. It prevents the results of these instructions from being written into the register file or temporary registers. It also cancels the I/O operation specified in the next instruction, but not in the current instruction. This signal is sampled at the same time as the next instruction is fed into the C port. All three of these signals are delayed internally when necessary: for example, the Cadd register write is only disabled on the instruction's fourth cycle. They prevent the cancelled instructions from modifying the Condition or Status Registers, thus preventing the state of the FPCN, FPEX and ZERO outputs from changing. Figure 39. ABORT- timing # **Instruction Set** | C BIT | FIELD | OPERATION | |-------|-------|-------------------------| | 0 | Encn0 | Condition Output Select | | 1 | Encn1 | · | | 2 | Mbin- | MBin Input Select | | 3 | Adst0 | ALU Destination Select | | 4 | Adst1 | | | 5 | Abin0 | ABin Input Select | | 6 | Abin1 | | | 7 | Abin2 | | | 8 | Dadd0 | D Port Register Address | | 9 | Dadd1 | | | 10 | Dadd2 | | | 11 | Dadd3 | | | 12 | Dadd4 | | | 13 | IOCt0 | I/O Control | | 14 | IOCt1 | | | 15 | Cwen- | C Port Write Enable | | 16 | Cadd0 | C Port Register Address | | 17 | Cadd1 | | | 18 | Cadd2 | | | 19 | Cadd3 | | | 20 | Cadd4 | | | 21 | Badd0 | B Port Register Address | | 22 | Badd1 | | | 23 | Badd2 | | | 24 | Badd3 | | | 25 | Badd4 | | | 26 | Aadd0 | A Port Register Address | | 27 | Aadd1 | | | 28 | Aadd2 | | | 29 | Aadd3 | | | 30 | Aadd4 | | | 31 | F0 | Function Code | | 32 | F1 | <b>!</b> | | 33 | F2 | | | 34 | Mbs- | Y Port Input Select* | \*Only on WTL 3332 Figure 40. Instruction format ### PRELIMINARY DATA October 1987 ### Instruction Set, continued #### **FORMAT** The WTL 3132 has a 34-bit instruction word. The WTL 3332 has a 35-bit instruction word because it requires the extra Mbs- bit to control the Y port input. Refer to figure 40 for the location of each field in the instruction word. ### 1. Mbs-bit. (WTL 3332 only) B bus input control bit. May be register file port B or external input port Y. #### 2. F field. Function control (F2..0) field. Selects function to be performed on this instruction's operands. #### 3. Aadd field. A register address (Aadd4..0) field. Selects location in general-purpose register file to be read out via the A port. #### 4. Badd field. B register address (Badd4..0) field. Selects location in general-purpose register file to be read out via the B port. Also encodes miscellaneous functions that require only one operand. ### 5. Cadd field. C register address (Cadd4..0) field. Selects location in general-purpose register file to be written into via the C port. ## 6. Cwen- bit. C port write enable bit. Active low. #### 7. IOCt field. I/O control (IOCt1..0) field. Selects type of I/O transfer performed via D port. #### 8. Dadd field. D register address (Dadd4..0) field. Selects location in general-purpose register file to be read out or written into via the D port. ### 9. Abin field. ALU input multiplexer input control (Abin2..0) field. Selects the input source for ALU. #### 10. Adst field. ALU output destination control (Adst...o) field. Selects output destination for ALU. May be the C bus or the C bus and a Temporary Register. #### 11. Mbin- bit. MBin port input control bit. Selects multiplier input source. May be the B bus or C bus. ### 12. Encn field. Condition select (Encn1..0) field. Enables a selectable combination of the condition and zero flags onto the FPCN output. All of the actions specified by these fields are defined in the same instruction word. In this way, all of the stages of an operation, from supplying its operands to storing its results back into a register, are specified together. ## Instruction Set, continued #### **MNEMONICS** The mnemonics shown in figures 41 through 43 are those used to control the XL-3132 in the XL-Series programming environment. They are given here to simplify understanding of the programming model and to provide a syntax in which to present the programming examples. These mnemonics represent a subset of the functions available on the WTL 3132/3332. In particular, the Input Bypass Mode is disabled and the Output Bypass, Internal Bypass, and Coprocessor Load Modes are enabled unless otherwise noted. The user is free to enhance or disregard this suggested programming model according to system requirements. | I/O SELECTION | OPERAND · | | |---------------|-----------|-------------| | | Source | Destination | | fload | X port | .f0-31 | | fstore | .f0-31 | X port | Note: If no I/O operation is specified, an I/O nop will be selected in the IOCt1..0 instruction field. Figure 41. Recommended mnemonics | OPERAND NOTATION | DESCRIPTION | | |---------------------------|------------------------------------------------------------------------------------------------------|--| | .f0-31<br>0<br>2<br>.t1-3 | Thirty-two, 32-bit general purpose registers Constant "0.0" Constant "2.0" Three temporary registers | | Figure 42. Recommended mnemonics | FUNCTION | OPERAND SELECTIONS | | | | |--------------|--------------------|------------------------|----------------------------------|--------------------------------------------| | PONCTION | SOURCE (Aadd) | . SOURCE (Badd) | SOURCE (Tregs) | DESTINATION (Cadd) | | fmac<br>fmns | .f0-31 | .f0-31<br>.f0-31 | 0, 2, or .t1-3 | .f0-31 and/or .t1-3 | | fmna | .f0-31 | .f0-31 | 0, 2, or .t1-3<br>0, 2, or .t1-3 | .f0-31 and/or .t1-3<br>.f0-31 and/or .t1-3 | | fadd<br>fsub | .f0-31 | .f0-31, 0, 2, or .t1-3 | | .f0-31 and/or .t1-3 | | fsubr | .f0-31 | .f0-31, 0, 2, or .t1-3 | | .f0-31 and/or .t1-3 | | flut<br>fabs | .f0-31<br>.f0-31 | | , | .f0-31<br>.f0-31 and/or .t1-3 | | fix<br>float | .f0-31<br>.f0-31 | | | .f0-31 and/or .t1-3 | | fnop | - | | | TO OT ANOTOLITY | Figure 43. Recommended mnemonics ## PRELIMINARY DATA October 1987 ## Instruction Set, continued ### CODE CONSTRAINTS The following set of rules prevents illegal code sequences: 1. All instructions must avoid writing to the Cadd register and Dadd register simultaneously. Thus no fload operation with Dadd = .fx may start on the fourth cycle of an operation with Cadd = .fx. ``` op #1 fadd .f?, .f?, .fx op #2 fadd .f?, .f?, .f?; fload.fx op #3 fadd .f?, .f?, .f?; fload.fx op #4 fadd .f?, .f?, .f?; fload.fx --Illegal op #5 fadd .f?, .f?, .f?; fload.fx ``` Figure 44. 2. Because the X port output is driven on the cycle after an fstore operation is specified, an fload cannot follow an fstore immediately. At least one I/O nop must intervene if the Coprocessor Load Mode is enabled (M6 = 1), or two I/O nops if it is disabled (M6 = 0) (see page 27). ``` op #1 fadd .f?, .f?, .f?; .fstore.f? op #2 fadd .f?, .f?, .f?; .fload.f? -Illegal op #3 fadd .f?, .f?, .f?; .fload.f? ``` Figure 45. Coprocessor Load Mode enabled, M6=1 ``` op #1 fadd .f?, .f?, .f?; fstore.f? op #2 fadd .f?, .f?, .f?; fload.f? --Illegal op #3 fadd .f?, .f?, .f?; fload.f? -Illegal op #4 fadd .f?, .f?, .f?: fload.f? ``` Figure 46. Coprocessor Load Mode disabled, M6=0 3. No temporary register can be written and read on the same cycle. Thus no operation that selects .tx as an operand register may start on the third cycle of an operation with Cadd = .tx. ``` op #1 fmac .f?, .f?, .tx op #2 fmac .f?, .f?, .f? .tx. op #3 fmac .f?, .f?, .tx. .f? -Illegal op #4 fmac .f?, .f?, ٠tx, .f? ``` Figure 47. Coprocessor Load Mode disabled, M6=0 ## Instruction Set, continued If code is to be interruptable and respond correctly to the NEUT-, STALL- and ABORT- signals, then these additional rules must also be followed. They all prevent delayed register writes from modifying operand values in a time-dependent fashion. 4. No operation with Aadd or Badd = .fx may start after the first cycle and before the fourth cycle of an operation with Cadd = .fx. (If the Internal Bypass Mode is disabled (M0 = 0 and M11 = 0), this becomes the fifth cycle). ``` op #1 fadd .f?, .f?, .f? --Illegal op #2 fadd .fx, .f?, op #3 fadd .fx, .f?, .f? —Illegal op #4 fadd .fx, .f?, .f? ``` Figure 48. Internal Bypass Mode enabled, M0 = 1 and M11 = 1 ``` op #1 fadd .f?, .f?, .fx op #2 fadd .fx, .f?, .f? -Illegal op #3 fadd .fx, .f?, .f? --Illegal op #4 fadd .fx, .f?, .f? -Illegal op #5 fadd .fx, .f?, ``` Figure 49. Internal Bypass Mode disabled, M0 = 0 and M11 = 0 5. No operation that selects .tx as an operand register may start after the first cycle and before the fourth cycle of an operation with Cadd = .tx. ``` op #1 fmac .f?, .f?, 0, .tx op #2 fmac .f?, .f?, .tx, .f? --Illegal op #3 fmac .f?, .f?, .tx, .f? —Illegal op #4 fmac .f?, .f?, .tx, .f? ``` Figure 50. 6. No operation with Aadd or Badd = .fx may start on the same cycle as an fload where Dadd = .fx. ``` op #1 fadd .fx, .f?, .f?; fload.fx --Illegal op #2 fadd .f?, .f?, .f? ``` Figure 51. 7. The NEUT- line does not cancel fload and fstore, so when it is used to cancel the effect of an instruction in the shadow of a delayed branch operation (as in the XL-Series), this instruction should not perform I/O transfers. (This is not necessary when NEUT- is asserted during an interrupt response cycle because the cancelled instruction is resubmitted for execution.) In the examples, the notation .f? is used to indicate any register except .fx. PRELIMINARY DATA October 1987 ### Initialization #### MODE REGISTER The Mode Register controls which of the special modes are enabled. Normally, it is initialized to the desired state and is not subsequently altered. Some mode bits are provided to maintain backward compatibility with previous versions of the WTL 3132/3332. Other bits are reserved and should be set to the value specified in figure 52. | MODE BIT | LOGIC VALUE | DESCRIPTION | |----------|-------------|---------------------------------------------| | МО | 0 | Internal Bypass Mode (Aadd = Cadd) disabled | | | 1 | Internal Bypass Mode (Aadd = Cadd) enabled | | M1 | 0 | fix rounds to negative infinity | | | 1 | fix rounds to nearest (enable range test) | | M2 | _ | Reserved: should be cleared to 0 | | M3 | 0 | Input Bypass Mode disabled | | | 1 | Input Bypass Mode enabled | | M4 | 0 | Output Bypass Mode disabled | | | 1 | Output Bypass Mode enabled | | M5 | 0 | FPEX Output disabled | | | 1 | FPEX Output enabled | | M6 | 0 | Co-processor Load Mode disabled | | | 1 | Co-processor Load Mode enabled | | M7 | · _ | Reserved: should be set to 1 | | М8 | 0 | FPEX active low and "sticky" | | | 1 | FPEX active high | | M9 | 0 | Double-pump Mode disabled | | | 1 | Double-pump Mode enabled | | M10 | - | Reserved: should be set to 1 | | M11 | 0 | Internal Bypass Mode (Badd = Cadd) disabled | | | 1 | Internal Bypass Mode (Badd = Cadd) enabled | | M12 | 0 | Y Late Input Mode disabled | | | 1 | Y Late Input Mode enabled | Figure 52. Mode selection table ## Initialization, continued The Mode Register is loaded by the fmode operation. This causes the Aadd, Cadd and ABin2..o fields in the instruction word to be loaded into the Mode Register WEITEK CORP 02 as shown by figure 53. fmode completes by the end of its first cycle. | C BIT | NORMAL USE | MODE BIT | COMMENT | |-------|------------|----------|---------------------------------| | 0 | Encn0 | 0 | FPCN is disabled during fmode | | 1 | Encn1 | 0 | | | 2 | Mbin- | 0 | | | 3 | Adst0 | 1 | ALU destination is C bus only | | 4 | Adst1 | 1 | | | 5 | Abin0 | M10 | | | 6 | Abin1 | M11 | | | 7 | Abin2 | M12 | | | 8 | Dadd0 | 0 | | | 9 | Dadd1 | 0 | · | | 10 | Dadd2 | 0 | | | 11 | Dadd3 | 0 | | | 12 | Dadd4 | 0 | | | 13 | IOCt0 | 0 | I/O nop specified | | 14 | IOCt1 | 0 | | | 15 | Cwen- | 1 | C port register writes disabled | | 16 | Cadd0 | M5 | · · | | 17 | Cadd1 | M6 | | | 18 | Cadd2 | M7 | | | 19 | Cadd3 | M8 | | | 20 | Cadd4 | M9 | | | 21 | Badd0 | 1 | fmode function code | | 22 | Badd1 | 1 | | | 23 | Badd2 | 0 | | | 24 | Badd3 | 0 | | | 25 | Badd4 | 0 | | | 26 | Aadd0 | MO | | | 27 | Aadd1 | M1 | , | | 28 | Aadd2 | M2 | | | 29 | Aadd3 | M3 | | | 30 | Aadd4 | M4 | | | 31 | F0 | 0 | Miscellaneous function selector | | 32 | F1 | 0 | | | 33 | F2 | 0 | | | 34 | Mbs-* | Ō | | Figure 53. Load Mode Register instruction format ## PRELIMINARY DATA October 1987 #### Initialization, continued Changing the contents of any bits in the Mode Register will have undefined effects on any currently executing instructions including I/O operations. The state of all registers should be initialized after execution of the fmode instruction. Some combinations of modes are not allowed. These are detailed in figure 54. | MODE # | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | | |------------------------------------------|-----|------------|-------|----------|------|-----|-----|----------|------------| | Internal Bypass Mode (M0 = 1 and M1 = 1) | (1) | _ | ~ | ~ | ~ | × | 1 | × | 1 | | Input Bypass Mode (M3 = 1) | (2) | 1 | _ | ~ | 1 | 1 | × | ~ | 1 | | Output Bypass Mode (M4 = 1) | (3) | 1 | 1 | | 1 | × | ~ | <i>~</i> | 1 | | Y Late Input Mode (M12 = 1) | (4) | <u>س</u> | 1 | <i>~</i> | _ | 1 | ~ | · · | <u>س</u> ا | | Double-Pump Mode (M9 = 1) | (5) | × | ~ | × | ~ | _ | × | <i>~</i> | × | | Coprocessor Load Mode (M6 = 1) | (6) | <u>س</u> ا | × | 1 | ~ | × | _ | × | w | | Use of floadrc Operation | (7) | × | 1 | ~ | \ \r | ~ | × | _ | × | | Use of NEUT-, STALL-, or ABORT- Inputs | (8) | ~ | \ \ru | ~ | ~ | × | 1 | × | _ | | | | | | | | | | | | x = Should not be enabled together, $\nu =$ Can be enabled together Note: Although (7) and (8) are not selected by mode bits, the user is able to avoid the use of such functions or control lines. Double-Pump Mode (5) requires floadrc (7) to be used every cycle. Figure 54. Mode exclusion table ### RESET SEQUENCE Before initializing the contents of the Mode Register, the WTL 3132/ 3332 must be set to a stable state after power up. Repeating nop and I/O nop instructions for at least four cycles will flush the multiplier/accumulator pipeline and allow the internal states to become well-defined. Until this sequence terminates, the rest of the system should ignore the contents of the data buses and the state of the ZERO, FPCN and FPEX pins. The registers should then all be initialized to known values (including the Mode, Condition, Status and Tregs) while nops continue to be input. The WTL 3132/3332 is then able to begin normal operation. #### Division #### DIVIDE LOGIC UNIT The WTL 3132/3332 have a divide logic unit. This unit consists of a look-up table ROM and three delay stages. The first cycle of the flut operation transfers the Aadd operand (a) to the divide logic unit. During the next two cycles this operand selects a seed value for the reciprocal of the operand (1/a) from the look-up table. This result is written to the Cadd register on the fourth cycle. If the Internal Bypass Mode is enabled, the result can be copied to a multiplier/accumulator input port at the same time that the Cadd register is being written. The look-up result is an IEEE single-precision number whose fraction is accurate to seven bits of precision. If the input is positive or negative infinity (greater than #7F800000 or less than #FF800000), the result is zero. If the input is zero, the result is #7FFFFFFF (which gets clamped to #7F800000 during refinement). flut does not update the Zero, Condition, or Status Registers. #### NOTATION: a = divisor (.f1) $R_0 = \text{seed for } 1/a \text{ (.f31)}$ $R_1 =$ first approximation (.f31) $R_2$ = second approximation (.f30) b = dividend (.f0) b/a = result (.f0) #### ALGORITHM: $R_1 = R_0 \times (2 - a \times R_0)$ $R2 = R1 \times (2 - a \times R1)$ | Cycle # | Opcode | | | | | 1/0 | Comment | |---------|--------|-------|-------|----|------|--------------|---------------------------------------| | 1 | flut | .f1, | | | .f31 | | Ro (≃1/a) | | 2 | fnop | | | | | | | | 3 | fnop | | | | | | | | 4 | fmna | .f1, | .f31, | 2, | .f30 | | 2 - a × Ro | | 5 | fnop | | | | | | | | 6 | fnop | | | | | | | | 7 | fmac | .f31, | .f30, | 0, | .f31 | | $R_1 = R_0 \times (2 - a \times R_0)$ | | 8 | fnop | | | | | | | | 9 | fnop | | | | | | | | 10 | fmna | .f1, | .f31, | 2, | .f30 | | $2-a \times R_1$ | | 11 | fnop | | | | | | | | 12 | fnop | | | | | • | | | 13 | fmac | .f31, | .f30, | Ο, | .f30 | , | $R_2 = R_1 \times (2 - a \times R_1)$ | | 14 | fnop | | | | | | | | 15 | fnop | | | | | | | | 16 | fmac | .fO, | .f30, | Ο, | .f0 | | B <b>x</b> 1/a | | 17 | fnop | | | | | | | | 18 | fnop | | | | | | | | 19 | fnop | | | | | ; fstore .f0 | store b/a | | 20 | fnop | | | | | | <i>bla</i> on X port | Notes: Internal bypass enabled (M0 = 1 and M11 = 1) X port output bypass enabled (M4 = 1) Figure 55. Recommended division sequence #### PRELIMINARY DATA October 1987 #### Division, continued #### DIVIDE CODE SUPPORT The initial approximation to 1/a has to be refined by successive approximation. This accurate value of 1/a must then be multiplied by b in order to complete the divide operation (b/a). The programmer or compiler has to supply code to support the following sequence of operations: - 1. Execute flut to obtain seed value. - 2. Iterate from this value to obtain an accurate divisor inverse using the *Newton-Raphson* algorithm. 3. Multiply the dividend by the inverse of the divisor just generated. For division, the *Newton-Raphson* algorithm converges quadratically. Theoretically, the number of bits of precision doubles with each iteration. Thus, two iterations should provide the full 23 bits of precision representable by the IEEE 32-bit format. Quantization errors introduced by rounding, however, can prevent the lsb from being accurate. The code example provides *bla* to 22 bits of precision. ## Data Format #### 32-BIT FLOATING POINT (IEEE STANDARD) The IEEE standard 32-bit floating point word divides into three fields: a sign bit, an 8-bit exponent and a 23-bit fraction field (shown in figure 56). The value contained in the 8-bit exponent field ranges from -127 to 128 (#00 to #FF) (shown in figure 57). The fraction is multiplied by two raised to this power to produce a floating point value. The significand field contains the 23-bit fraction and the hidden bit. Inserted during arithmetic processing, the hidden bit has a value of one for all normalized numbers and zero for zero. The fraction is the 23 bits to the right of the hidden bit. Bit F22 has a value of $2^{-1}$ ; bit F0 has a value of $2^{-22}$ ; the hidden bit has a value of $2^{0}$ . All constants are in this IEEE format. Figure 56. 32-bit floating point (IEEE standard) The value of an IEEE floating point number is determined by the following: | E | F | VALUE | DESCRIPTION | |-------|-----|--------------------------------------------|-------------------------| | 1-254 | Any | (-1) <sup>s</sup> (1.F) 2 <sup>E-127</sup> | Normalized number (NRN) | | 0 | | (-1) <sup>s</sup> 0.0 | Zero | Figure 57. 32-bit floating point value #### Data Format, continued Figure 58. 24-bit fixed point (two's complement) #### 24-BIT TWO'S COMPLEMENT INTEGERS The value of the 24-bit integer field shown in figure 58 can range from $(-2^{23})$ to $(2^{23}-1)$ and must be sign-extended to 32-bits to be compatible with the WTL 3132/3332 format. The eight-bit sign extension field is a repeat of bit 23, the sign bit of the two's complement number. The user must ensure that integer operands conform to this format: integer results are automatically sign-extended to match. #### FIX The fix function converts a number from floating point format to sign-extended 24-bit two's complement integer format. If the magnitude of its operand is greater than $2^{22}$ , Encn<sub>1</sub> is set to 1, and M<sub>1</sub> = 1, it will set the Condition Register to 1. This limit was chosen to allow software to test for the case $n = 2^{23}$ , which cannot be represented by a 24-bit two's complement number. If the operand is too large to be represented in the integer format, the result is clamped to either #007FFFFF or #FF800000, according to its sign. fix does not attempt to set the Zero or Status Registers. It executes in the same number of cycles as every other multiplier/accumulator instruction. #### **FLOAT** The float function converts a number from sign-extended two's complement integer format to floating point format. If its operand does not have consistent sign extension (bits 24-31 all equal), Encn<sub>1</sub> is set to 1 and M1 = 1, it will set the Condition Register to 1. The result of a float operation on such an operand is not defined. float does not attempt to set the Zero or Status Registers. It executes in the same number of cycles as any other multiplier/accumulator instruction. PRELIMINARY DATA October 1987 #### **IEEE Considerations** The WTL 3132 and WTL 3332 comply with the IEEE Standard for Binary Floating Point Arithmetic (P754) in most respects. The differences described below apply to all of the arithmetic functions (fsubr, fsub, fadd, fmna, fmns, fmac, fabs). #### DENORMALIZED NUMBERS Denormalized numbers have a magnitude less than $2^{-126}$ but greater than zero. The IEEE standard includes denormalized numbers to allow gradual underflow for operations that produce results that are too small to be expressed as normalized numbers. The WTL 3132/3332 do not support denormalized numbers. If the result of an operation is smaller than $2^{-126}$ , it is replaced by zero and the Zero Register is set to 1. Denormalized operands are detected and flushed to zero (with the same sign) before the operation is performed; no indication of this is provided. ## NOT A NUMBER (NAN) HANDLING The IEEE standard represents NaNs with numbers that have the maximum exponent value and a non-zero fraction. The WTL 3132/3332 do not detect attempts to perform calculations on NaNs. Only the flut operation may produce a NaN (when given zero as an operand). This is clamped to the appropriate infinity when refined by the divide code example given on page 38. Other operations may have undefined effects when given a NaN as an operand, so their use should, in general, be avoided. No arithmetic operation generates NaNs; all results with the maximum exponent have their fractions held to zero. #### INFINITY AND OVERFLOW The IEEE standard represents infinities with numbers that have the maximum exponent value and zero as the fraction. The WTL 3132/3332 do not detect attempts to perform calculations on infinite operands. Some operations may have undefined effects when given an infinite operand, so their propagation should, in general, be avoided. However, if an operation creates a result that is too large to be represented in the floating point format, its result is clamped to an infinite value as required by the specification. The Status Register is set by the creation of an infinite value during an operation. #### UNDERFLOW When the result of an operation has a magnitude in the range $0 < n < 2^{-126}$ , the WTL 3132/3332 round it to zero and set the Zero Register to 1. There is no way to distinguish underflow from a result that is exactly zero. #### ROUNDING The WTL 3132/3332 support only the round-to-nearest mode: the infinitely precise result of an operation is rounded to the closest representation that fits in the destination format. If the result is exactly halfway between two representations, it is rounded to the nearest even fraction. The IEEE standard requires rounding to occur after each arithmetic operation. The WTL 3132/3332 do not round between the multiply and add components of the fmac, fmns and fmna functions. The error in the result is always less than two least-significant bits. If the ABin port of the ALU is set to the constant 0.0, then the fmac function performs a multiply that conforms to the IEEE standard. The fix operation only can be set to round to negative infinity by clearing M1 to zero. # DC Specifications ## ABSOLUTE MAXIMUM RATINGS | Supply voltage | |----------------------------------------| | Input voltage | | Output voltage | | Operating temperature range (Tcase) | | Storage temperature range65°C to 150°C | | Lead temperature (10 seconds) 300°C | | Junction temperature 175°C | ## RECOMMENDED OPERATING CONDITIONS | | PARAMETER | COMM | IERCIAL | MIL | | | |------|-----------------------------------------|-----------|------------|------------|------------|------| | | | MIN | MAX | MIN | MAX | UNIT | | 1 00 | Supply voltage<br>Operating temperature | 4.75<br>0 | 5.25<br>85 | 4.5<br>-55 | 5.5<br>125 | °C | #### DC ELECTRICAL CHARACTERISTICS | | COV | /MER | CIAL | N | ЛІLІТА | RY <sup>1</sup> | | |-------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | TEST CONDITIONS | MIN | TYP | MAX | MIN | TYP | MAX | UNIT | | $V_{DD} = MAX$ $V_{DD} = MIN$ $V_{DD} = MAX$ $V_{DD} = MAX$ $V_{DD} = MIN$ $V_{DD} = MIN, I_{OH} = -1.0 \text{ mA}$ $V_{DD} = MIN, I_{OL} = 4.0 \text{ mA}$ | 2.4<br>2.0<br>2.4 | | 0.8<br>0.8<br>0.4 | | | | V<br>V<br>V<br>V | | $V_{DD} = MAX, V_{IN} = V_{DD}$ $V_{DD} = MAX, V_{IN} = 0V$ $V_{DD} = MAX, V_{IN} = 0V$ $V_{DD} = MAX, V_{IN} = V_{DD}$ | | | 10<br>10<br>10<br>10 | | | | μΑ<br>μΑ<br>μΑ | | V <sub>DD</sub> = MAX,T <sub>CY</sub> = MIN<br>TTL inputs <sup>2</sup> | | | 200 | | | | mA | | V <sub>DD</sub> = 5.0V<br>T <sub>AMBIENT</sub> = 25°C<br>f = 1 MHz | | 10<br>25<br>15<br>20 | 10<br>30<br>20<br>25 | | | | pF<br>pF<br>pF | | | V <sub>DD</sub> = MIN<br>V <sub>DD</sub> = MAX<br>V <sub>DD</sub> = MIN<br>V <sub>DD</sub> = MIN, I <sub>OH</sub> = -1.0 mA<br>V <sub>DD</sub> = MIN, I <sub>OL</sub> = 4.0 mA<br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = V <sub>DD</sub><br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = 0V<br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = 0V<br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = V <sub>DD</sub><br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = V <sub>DD</sub><br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = V <sub>DD</sub><br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = V <sub>DD</sub><br>V <sub>DD</sub> = MAX, V <sub>IN</sub> = V <sub>DD</sub> | TEST CONDITIONS V D | TEST CONDITIONS MIN TYP | MIN TYP MAX V DD MIN TYP MAX V DD MIN V DD MIN V DD MIN V DD MIN TYD MAX V DD MIN TYD MAX V DD MIN TOL = 4.0 mA MIN TYD MAX V DD MAX TOL MAX TOL MAX TOL MAX TOL MAX TOL MIN | TEST CONDITIONS MIN TYP MAX MIN V DD = MAX V DD = MIN V DD = MIN V DD = MIN V DD = MIN, I DH = -1.0 MA V DD = MIN, I DH = -1.0 MA V DD = MIN, I DH = 4.0 MA V DD = MAX, V IN = V DD V DD = MAX, V IN = 0V V DD = MAX, V IN = 0V V DD = MAX, V IN = V DD 10 V DD = MAX, V IN = V DD 10 V DD = MAX, T DY T I INPUTS 2 T AMBIENT = 25°C f = 1 MHz MIN TYP MAX MIN 0.8 2.4 0.4 10 10 200 10 200 | TEST CONDITIONS MIN TYP MAX MIN TYP V DD = MAX | TEST CONDITIONS MIN TYP MAX MIN TYP MAX V DD = MAX V DD = MIN V DD = MAX V DD = MIN V DD = MIN, I OH = -1.0 MA V DD = MIN, I OH = 4.0 MA V DD = MAX, V IN = V DD V DD = MAX, V IN = 0V V DD = MAX, V IN = 0V V DD = MAX, V IN = V DD 10 10 V DD = MAX, V IN = V DD 10 V DD = MAX, V IN = V DD 10 V DD = MAX, V IN = V DD 10 V DD = MAX, T OH TIL inputs 2 V DD = 5.0V T AMBIENT = 25°C f = 1 MHz MIN TYP MAX MIN TYP MAX MIN TYP MAX 10 10 2.4 0.8 2.4 10 10 25 30 15 20 | ## PRELIMINARY DATA October 1987 ## Timing Diagrams Figure 59. Test load for delay measurement Figure 60. Clock timing Figure 61. Tri-state timing ## Timing Diagrams, continued Figure 62. Signal timing diagram ## PRELIMINARY DATA October 1987 ## **AC** Specifications | | | | TEST COI | NDITIONS | 1, 2, 3 | 3<br>, | | ······ | | | |-----------------------------------------------------------------------------------------------------------------|-------------------------------------|--------------------------------------------|----------------------|----------------------------------------------------------|---------|--------|-------------------------------|-----------------------------------|----------|----------------| | V <sub>CC</sub> = MIN | | V <sub>OH</sub> = 2<br>V <sub>OL</sub> = 0 | .8V, Iон<br>.4V, IoL | = -1.0 mA<br>= 4.0 mA | ١ | To | ASE = | 85 ° C | C LOAD = | 40 pF | | DESCRIPTION | WTL313<br>WTL 33<br>XL-3132<br>COMM | 32-120 | WTL 33<br>XL-3132 | WTL3132-100<br>WTL 3332-100<br>XL-3132-100<br>COMMERCIAL | | | -080<br>2-080<br>080<br>RCIAL | WTL31<br>WTL 3<br>XL-313<br>MILIT | UNIT | | | • | MIN | MAX | MIN | MAX | IIM | 4 | MAX | MIN | MAX | | | T <sub>CY</sub> Clock cycle time T <sub>CH</sub> Clock high time T <sub>CL</sub> Clock low time | 120<br>50<br>50 | | 100<br>45<br>45 | | 80 | 0 | | 120 | | ns<br>ns<br>ns | | T <sub>R</sub> Clock rise time<br>T <sub>F</sub> Clock fall time | | 5<br>5 | | 5<br>5 | | | | | | ns<br>ns | | Bus inputs (C, X, Y): T <sub>S</sub> input setup time T <sub>H</sub> input hold time | 20 | | 15<br>2 | | | | | | | ns<br>ns | | T <sub>SA</sub> Input setup time<br>T <sub>HA</sub> Input hold time<br>(X bus, coprocessor load<br>mode) | 20 2 | | 15<br>2 | | | | | | | ņs<br>ns | | T <sub>SB</sub> Input setup time T <sub>HB</sub> Input hold time (X bus, double-pump or Y bus, late input mode) | 20 2 | | 15<br>2 | | | | | | | ns<br>ns | | Bus outputs (X, Z): T <sub>D</sub> Output delay time T <sub>V</sub> Output valid time | 3 | 35 | 3 | 30 | | | | | | ns<br>ns | | T <sub>ENA</sub> Tri-state enable time 4<br>T <sub>DIS</sub> Tri-state disable time 5 | | 35<br>35 | | 30<br>30 | | | | | | ns<br>ns | | NEUT-, STALL-, ABORT-: T <sub>SN</sub> Input setup time T <sub>HN</sub> Input hold time | 20<br>2 | | 15 2 | | | | | | | ns<br>ns | | FPCN, FPEX, ZERO: T <sub>DF</sub> Output delay time T <sub>VF</sub> Output valid time | 3 | 35 | 3 | 30 | · | | | | | ns<br>-ns | | T <sub>OP</sub> Pipelined operation time per stage T <sub>LA</sub> Total latency | 120 | | 100 | | | 80 | | 120 | | ns | | T <sub>LA</sub> Total latency<br>register-to-register | 360 | | 300 | | 2 | 40 | | 360 | | ns | NOTES: - Worst case over time and temperature range. Input levels are 0.4 and 3.4V. Timing transitions are measured at 1.5V unless otherwise noted. Device must be powered for at least 20 ms before testing. TDIS is not tested but is guaranteed by design. | Pin #1<br>Identifier | 1. | 2 | 3 | 4 | 5 | 1 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |----------------------|------------|------------|-------|-------|-------|------------|------------|------------------|-----|-----|-----|-----|--------|------------|-------| | А | GND | NC | NC | X18 | NC | NC | NC | X23 | VDD | NC | 20 | X27 | NC | NC | NC | | В | NC | NC | GND | X16 | NC | NC | X20 | X22 | X24 | X25 | X26 | NC | X29 | X31 | GND | | С | NC | NC | X15 | GND | X17 | X19 | X21 | NC | VDD | NC | X28 | NC | ×30 | TIE<br>LOW | F2 | | D | NC | X12 | X14 | | | | | | | | | | GND | F0 | Adst1 | | E | ×10 | X11 | X13 | | | | | | | | | | F1 | Adst0 | Abin0 | | F | ×8 | NC | NC | | | | | | | | | | Abin2 | Abin1 | ABORT | | G | NC | X9 | VDD | | | | | | | | | | STALL- | NEUT- | Cwen- | | н | NC | ×7 | VDD | | | | | L 3132<br>P VIEW | | | | | Cadd2 | Cadd4 | Cadd3 | | J | <b>×</b> 6 | <b>×</b> 5 | 20 | | | | | | | | | | Cadd1 | Aadd3 | Cadd0 | | К | NC | NC | ИО | | | | | | | | | | Aadd1 | Aadd2 | Aadd4 | | L | <b>X</b> 4 | X2 | NC | | | | | | | | | | Badd0 | Badd4 | Aadd0 | | М | ХЗ | ×1 | GND | | | | | | | | | | Dadd2 | Badd1 | Badd3 | | N | NC | NC | OEX- | VDD | FPCN | GND | TIE<br>LOW | TIE<br>LOW | GND | GND | GND | VDD | Dadd1 | Dadd4 | Badd2 | | Р | ×0 | TIE<br>LOW | Encn1 | FPEX | Encn0 | TIE | CLK | GND | GND | GND | GND | GND | VDD | Dadd0 | Dadd3 | | R | ZERO | GND | 10Ct0 | 10011 | Mbin- | TIE<br>LOW | TIE<br>LOW | GND | GND | GNÖ | GND | GND | GND | GND | GND | Notes: Pins marked "Tie Low" must be connected to ground. Pins marked "NC" should be left unconnected (floating). Figure 64. WTL 3132 and XL-3132 pin configuration ## PRELIMINARY DATA October 1987 # Pin Configuration, continued | Pin #1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |-----------------|-------|-------|------------|-------|----------------------|-------|-------|------------|------------|-------|------------|-------|--------|-------|-------|-----|-----| | Identifier<br>A | OEX- | TIE | Z0 | Z2 | хз | Z4 | NC | <b>X</b> 5 | <b>Z</b> 7 | Х9 | <b>Z</b> 9 | X11 | Z12 | X13 | X15 | Z15 | GND | | В | OEZ- | ZERO | X0 | Z1 | X2 | NC | NC | <b>Z</b> 5 | VDD | VDD | ×8 | 210 | Z11 | Z13 | X14 | GND | GND | | С | GND | Encn1 | GND | X1 | Z3 | X4 | ×6 | NC | Z6 | Z8 | X7 | ×10 | X12 | Z14 | Z16 | X17 | Z17 | | D | IOCt0 | FPEX | FPCN | | | | | | | | | | | | X16 | Z18 | X18 | | E | Encn0 | Mbin- | IOCt1 | | | | | | | | | | | | Z20 | X19 | Z21 | | F | Mbs- | CLK | TIE<br>LOW | | | | | | | | | | | | Z19 | X21 | X20 | | G | TIE | TIE | Y28 | | | | | | | | | | | 4 | VDD | VDD | VDD | | Н | Y31 | TIE | Y30 | | | | | | | | | | | X23 | X22 | Z22 | | | J | TIE | Y29 | Y26 | | WTL 3332<br>TOP VIEW | | | | | | | | Z23 | X24 | Z24 | | | | K | Y22 | Y24 | Y25 | | | | | | | | | | | | Z25 | X25 | Z26 | | L | Y23 | Y21 | Y27 | | | | | | | | | | | | VDD | VDD | VDD | | м | Y18 | Y19 | Y20 | | | | | | | | | | | | X27 | X28 | Z27 | | N | Y16 | Y15 | Y17 | | | | | | | | | | | | X29 | Z28 | X26 | | P | Y14 | Y13 | Y12 | | | | | • | | | | | | | Z31 | Z30 | Z29 | | R | Y10 | Y11 | Y9 | Dadd3 | Y6 | Badd2 | Aadd4 | Aadd1 | Cadd0 | Cadd1 | Y2 | Y0 | STALL- | Adst1 | F1 | X31 | ×30 | | T | VDD | VDD | Dadd2 | Y7 | Badd1 | Badd3 | Aadd0 | Aadd2 | Aadd3 | Cadd2 | Cadd4 | Cwen- | Abin0 | Abin2 | Adst0 | F0 | TIE | | U | Y8 | Dadd | Dadd1 | Dadd4 | Badd0 | Y5 | Badd4 | Y4 | Y3 | Y1 | Cadd3 | ABORT | NEUT | Abin1 | F2 | GND | GND | Notes: Pins marked "Tie Low" must be connected to ground. Pins marked "NC" should be left unconnected (floating). Figure 65. WTL 3332 pin configuration # Physical Dimensions ## WTL 3132 144-PIN PIN GRID ARRAY | Symbol | DIMENS | SIONS | |--------|--------------------------|-------------------------| | | INCHES | ММ | | A1 | 0.080 ± 0.008 | 2.03 ± 0.20 | | A2 | 0.180 typ. | 4.57 typ. | | A3 | 0.050 | 1.27 | | D | 1.575 sq. <u>+</u> 0.016 | 40.0 sq. + 0.41 | | E1 | 1.400 sq. <u>+</u> 0.012 | 35.56 sq. <u>+</u> 0.30 | | E2 | 0.050 dia, typ. | 1.27 dla. typ. | | E3 | 0.018 <u>+</u> 0.002 | .46 <u>+</u> 0.05 | | d | 0.070 dia. typ. | 1.78 dla. typ. | | Ө | 0.100 typ. | 2.54 typ, | ## WTL 3332 168-PIN PIN GRID ARRAY | Symbol | DIMENSIONS | | | | | | | | | |--------|--------------------------|------------------------|--|--|--|--|--|--|--| | | INCHES | ММ | | | | | | | | | A1 | 0.095 ± 0.009 | 2.41 ± 0.23 | | | | | | | | | A2 | 0,180 typ. | 4.57 typ. | | | | | | | | | А3 | 0.050 | 1.27 | | | | | | | | | D | 1.750 sq. <u>+</u> 0.018 | 44.5 sq. <u>+</u> 0.46 | | | | | | | | | E1 | 1.600 sq. | 40,6 sq. | | | | | | | | | E2 | 0.050 dia. typ. | 1.27 dla. typ. | | | | | | | | | E3 | 0.018 ±0.002 | .46 <u>+</u> 0.05 | | | | | | | | | d | 0.070 dla. typ. | 1.78 dla. typ. | | | | | | | | | е | 0.100 typ. | 2,54 typ. | | | | | | | | ## PRELIMINARY DATA October 1987 ## Appendix A: The XL-3132 in the XL Environment #### WEITEK XL-SERIES The WEITEK XL-Series is a family of three VLSI processors: the XL-8000, a high-speed 32-bit integer processor; the XL-8032, a single-precision floating point processor; and the XL-8064, a double-precision floating point processor. These processors give the performance of bit-slice components, and are supported by a full complement of development tools. These include C and FOR-TRAN 77 compilers, and an assembler which produces code that may be optimized by an instruction parallelizer. The development system offers both hardware and software simulators with debugging facilities. The programmer remains free to create custom microcode routines for peak performance. This appendix is dedicated to the XL-8032 single-precision floating point processor. Further information may be found in the XL-Series Overview, the XL-Series Hardware Designer's Guide, the XL-Series Programmer's Reference Manual, the XL-8136 Data Sheet and the XL-8137 Data Sheet. The XL-8032 processor consists of three interconnected VLSI components: - XL-8136 Program Sequencing Unit (PSU) - XL-8137 Integer Processing Unit (IPU) - XL-3132 Floating Point Unit (FPU) Each of these components is manufactured in high-density, low-power CMOS and delivered in a 144-pin grid array package. Unlike a traditional microprocessor, the XL-8032 is not constrained by the limits of circuit density or bus bandwidth imposed by a single chip in a small package. Consequently, bit-slice performance levels can be obtained both for integer and floating point operations. The XL-Series simplifies system design. Zero glue interfacing is provided by a small number of dedicated signals that communicate state information between the components. These signals and the system buses need only be connected as shown in figure 69 in order to create the XL-8032. The purpose of each interconnection is described in more detail below. #### **BUSES** Four high-bandwidth system buses are provided by the XL-8032: #### 1. Code bus. The 64-bit code bus feeds the code input ports of the PSU, IPU and FPU. The PSU and IPU share 32 of the 64-bits; this half of the code word directs program control operations, address generation, loads and stores, and integer arithmetic. The remainder of the code word directs floating point operation. The designer may choose to lengthen the code word to add custom extensions to the processor architecture. #### 2. Data bus. The 32-bit data bus is shared by the IPU and FPU. It allows bytes, 32-bit integers and 32-bit floating point numbers to be transferred between the processing units and data memory. #### 3. Code Address bus. The 32-bit code address bus carries the address of the next instruction from the PSU to the code memory. A word address allows up to 32 Gbytes of 64-bit wide code memory. #### 4. Data Address bus. A 32-bit data address bus carries the address of the next data read or write. The address is generated by the IPU and the data may be transferred to or from the IPU or FPU as required. A byte address allows up to 4 Gbytes of 32-bit wide data memory. Support for accessing bytes, half-words, and words is provided by the IPU. The XL-3132 is designed to hook directly to the code and data buses alongside the other components of the XL-8032. When driven by the same system clock, the code word is sampled by all three components simultaneously, and the data bus is driven or sampled at the same time in the cycle no matter which component is transferring information. The code and data memory systems may be implemented with SRAM, static column DRAM or interleaved DRAM. Both code and data caches are supported by the XL-8032 chip set. More details are provided in the XL-Series Hardware Designer's Guide. - 1. Dashed lines indicate bits in the sequencer field. - 2. Bits marked with an "X" are reserved bits. - 3. The meaning of each bit field is described in the body of this data sheet. Figure 66. XL-3132 signal assignments in the XL-8032 code word #### INSTRUCTION FORMAT The XL-8032 has a 64-bit instruction word. The bits that are directed to the XL-3132 are shown in figure The lower 32-bits of the instruction word are shared by the IPU and the PSU. Bits 0-23 normally define the IPU operation. Bits 24-31 define the instruction flow control performed by the PSU. Five of these control bits, 24-28, are also used as the floating point register address (Dadd) when floating point load and store operations are performed. This saves on code bits and insures that the FPU and IPU never compete for the data bus. The XL-3132 has 34 C port inputs. When used in the XL-8032 configuration, the Dadd field is tied to the appropriate bits of the PSU code input. The Mbin-bit is tied to GND, reducing the number of bits used in the XL-8032 instruction word to 60; the remaining four are marked with an X in figure 66 and should be set to 0 in any code to maintain compatibility with future versions of the processor. #### LOAD/STORE MODEL The XL-Series has a consistent load/store model regardless of processor configuration. Each processing unit has its own register file: register moves between the IPU and FPU must be made through the data memory. Each of these register files is multi-ported and each register may be the operand source or result destination of any instruction implemented by the unit. When an instruction takes more than one cycle to execute, the registers that supply its operands and receive its result cannot be modified until it has been completed. This allows any instruction to be resubmitted for execution after an interrupt with its original state. Transactions between the register files and data memory are performed with dedicated load/store instructions. The only restriction on loads and stores is that the operands of an operation be loaded before it is executed and that it shall have completed before its result is stored. This allows the parallelizer considerable freedom to optimize register usage and I/O transactions. An example of the normal sequence of operation is given below. This example leaves several free cycles in which other loads, stores, and calculations could be performed in parallel. ``` addr .ra fload .fx fabs .fx, .fy nop addr .rb fstore .fy ``` Figure 67. \* rb = |\*ra| ## PRELIMINARY DATA October 1987 # Appendix A: The XL-3132 in the XL Environment, continued Using the Coprocessor Load Mode gives the fload and fstore operations the same timing as the IPU's load and store operations. For loads, the address is presented on the AD bus at the beginning of a cycle and the data is expected to be available on the D bus by the end of that cycle. For stores, the address is presented on the AD bus at the beginning of a cycle and the data is driven onto the D bus during the next cycle. Because the addr and fload instructions can be executed in parallel, they may be pipelined to support contiguous load operations (one per cycle). If, however, loads and stores are to be interleaved, each store must be allowed two cycles; the latter cycle is then always available for the I/O nop required when following an fstore by an fload. This has minimal impact on overall performance because loads usually outnumber stores; and the parallelizer can organize I/O transfers efficiently. If the code constraints covered in the body of this data sheet are followed or if WEITEK software tools are used, then this load/store model will be obeyed. #### MODES The XL-3132 Mode Register must be initialized to the values given in figure 68 when coupled into the XL-8032 processor. The resulting programming model is illustrated in figure 70. Each selection is explained here: - 1. Internal Bypass Mode is enabled to maximize performance. - 2. The fix and float range test may be enabled or disabled as required. - 3. Input Bypass (and Double-Pump) Modes are disabled, because they cannot operate when the Co processor Load Mode is enabled. The MBin and ABin ports on the multiplier/accumulator cannot, therefore, receive operands from the C bus. - 4. Output Bypass Mode is enabled to maximize performance. - 5. Overflows may be enabled or disabled as required. - 6. Coprocessor Load Mode is enabled so that the XL-3132 matches the XL-Series load/store model. - 7. The Y port Late Input Mode is disabled; it only operates with the WTL 3332. | MODE BIT | LOGIC VALUE | DESCRIPTION | | | | |----------|-------------|----------------------------------------------------|--|--|--| | M0 | 1 | Internal Bypass Mode (Aadd = Cadd) enabled | | | | | M1 | 1 | fix and float range test enabled (may be disabled) | | | | | M2 | 0 | Reserved: must be cleared to 0 | | | | | м3 | 0 | Input Bypass Mode disabled | | | | | M4 | 1 | Output Bypass Mode enabled | | | | | M5 | 1 | Overflow exception enabled (may be disabled) | | | | | M6 | 1 | Coprocessor Load Mode enabled | | | | | M7 | 1 | Reserved: must be set to 1 | | | | | M8 | 0 | FPEX active low and "sticky"* | | | | | м9 | 0 | Double-pump Mode disabled | | | | | M10 | 1 | Reserved: must be set to 1 | | | | | M11 | 1 | Internal Bypass Mode (Aadd = Badd) enabled | | | | | M12 | 0 | Y Late Input Mode disabled | | | | <sup>\*</sup>These features are not available on all versions of the XL-3132; check the Programmer's Reference Manual for details. Figure 68. Mode selection table # Appendix A: The XL-3132 in the XL Environment, continued Figure 69. XL-8032 schematic #### PRELIMINARY DATA October 1987 # Appendix A: The XL-3132 in the XL Environment, continued Figure 70. XL-3132 in XL mode ## Appendix A: The XL-3132 in the XL Environment, continued #### CONDITIONS AND EXCEPTIONS The XL-8032 provides several signals which transfer state information from the processing units (IPU, FPU) to the sequencer (PSU). These are either conditions, upon which the PSU may decide to branch; or exceptions, which require software intervention to recover gracefully. The FPCN output on the XL-3132 should be connected to the FPCN input on the XL-8136. The FPCN signal is enabled by the Encn...0 field in the instruction word to indicate whether the result of an operation is = 0, < 0, or <= 0. The PSU may then execute a "branch on condition" instruction to selectively transfer program control according to the outcome of this comparison. The FPEX- output on the XL-3132 should be connected the EXT4- interrupt input on the XL-8136. If the overflow enable bit in the mode register is set, then any arithmetic operation that generates an invalid result can flag this exception to the PSU. The system software is expected to react appropriately to this interrupt. #### NEUT-, STALL- AND ABORT- The XL-8032 components all use the NEUT-, STALLand ABORT- signals. These pins should be connected directly between the three chips in the XL-8032 processor (see figure 69). NEUT- cancels the effect of the current instruction. The signal is generated by the PSU. It is normally used in the shadow of a delayed branch to prevent the instruction in the pipeline from having any effect on the state of the IPU and FPU. STALL— cancels the effect of the next instruction. It should be generated by the code memory subsystem to indicate the delay or absence of the correct code word. This prevents any invalid operation that may be present on the code input at this time from affecting the state of the processor. It allows wait states to be inserted in code fetches, perhaps to allow for DRAM refresh or a code cache miss. ABORT- cancels the effect of both the current and the next instructions. It should be generated by the data memory subsystem to indicate the inability of the system to instantly access the required data word. This allows the canceled instructions to be repeated when the address becomes valid and for this retry to have the correct effect. It allows the data memory to be "not ready" if, for example, a page fault occurs. ## PRELIMINARY DATA October 1987 ## Appendix B: Programming Examples In the following examples, square root has Coprocessor Load Mode enabled and Input Bypass Mode disabled. It may be used in the XL-Series environment. The other examples have Coprocessor Load Mode disabled and Input Bypass Mode enabled. They must be modified to be run on the XL-Series machines. #### SQUARE ROOT #### NOTATION: a = operand (.f4) Ro = seed (.f0) $R_1$ = first approximation (.f1) $R_2$ = second approximation (.f2) 0.5 = constant (.f5) 3.0 = constant (.f6, .t1) s = result (.f3) #### ALGORITHM: $$R_{n} = \frac{R_{n-1}}{2} \left( 3 - a R_{n-1}^{2} \right)$$ $$\sqrt{a} \simeq aR_n$$ $$R_I = \frac{R_0}{2} \left( 3 - a R_0^2 \right)$$ $$R_2 = \frac{R_I}{2} \left( 3 - a R_I^2 \right)$$ | Cycle # | Opcode | | | | | I/O | Comment | |---------|--------|------|------|------|------|------------|--------------------------| | 1 | fnop | | | • | ; | fload .f6 | | | 2 | fadd | .f6 | Ο, | .t1 | ; | fload .f0 | move 3.0 to .t1 | | 3 | fmac | .fO, | .fO, | Ο, | .f6; | fload .f5 | Ro×Ro | | 4 | fmac | .fO, | .f5, | Ο, | .f3; | fload .f4 | 0.5 <i>×R</i> ₀ | | 5 | fnop | | | | ; | | | | 6 | fmna | .f4, | .f6, | .t1, | .f7 | | 0.0 ( 52 ) | | 7 | fnop | | | | | | $3.0 - (a \times R_0^2)$ | | 8 | fnop | | | | | | | | 9 | fmac | .f3, | .f7, | 0, | .f1 | | R <sub>1</sub> | | 10 | fnop | | | | | | | | 11 | fnop | | | | | | | | 12 | fmac | .f1, | .f1, | Ο, | .f6 | | $R_1 \times R_1$ | | 13 | fmac | .f1, | .f5, | Ο, | .f3 | | 0.5×R₁ | | 14 | fnop | | | | | | | | 15 | fmna | .f4, | .f6, | .t1, | .f7 | • | $3.0 - (a \times R_1^2)$ | | 16 | fnop | | | | | | • | | 17 | fnop | | | | | | | | 18 | fmac | .f3, | .f7, | 0, | .f2 | | R <sub>2</sub> | | 19 | fnop | | | | | | | | 20 | fnop | | | | | | | | 21 | fmac | .f2, | .f4, | Ο, | .f3 | | $s = a \times R_2$ | | 22 | fnop | | | | | | | | 23 | fnop | | | | | | | | 24 | fnop | | | | ; | fstore .f3 | store s | | 25 | fnop | | | | | | s on X port | Figure 71. WEITEK CORP 02 The square root code given fetches a seed for the first approximation to the result and then proceeds to refine its value using the Newton-Raphson algorithm. It differs from the example for division given in the body of the data sheet in that no on-chip look-up table is provided for the seed value. The external table assumed here has the same accuracy as the internal divide look-up table (all of the exponent and the seven most-significant bits of the fraction). The following formulae may be used to calculate the entries of this table. Table entry exponent (G) and fraction (H) in terms of the operand values (E) and (F). Care should be taken to insure that zeroes, negative numbers, infinities and NaNs are handled correctly. $$G = \left[ \frac{380 - E}{2} \right]$$ if $E(8) = 1$ ; $H = \left[ \frac{2^{13}}{\sqrt{257 + F}} - 256 \right]$ if $E(8) = 0$ ; $H = \left[ \frac{2^{13}}{\sqrt{512 + 2F}} - 256 \right]$ #### 4×4 MATRIX TRANSFORM #### NOTATION: Transform matrix A (a11 to a44) (.f16 to .f31) Operand vector X (x<sub>1</sub> to x<sub>4</sub>) (.f0 to .f3) Result vector Y (y1 to y4) (.f4 to .f7) Partial results p1 to p4 (Tregs) #### ALGORITHM: Y = AX $$\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}$$ [ ] = truncate to lower integer. | Cycle # | Opcode | | | | | 1/0 | Comment | |---------|--------|------|-------|------|------|-----------|---------------------------------| | 1 | fmac | .fO, | .f16, | 0, | .t1; | fload .f0 | $p_1 = a_{11} \times x_1$ | | 2 | fmac | .fO, | .f17, | Ο, | .t2 | | $p_2 = a_{21} \times x_1$ | | 3 | fmac | .fO, | .f18, | Ο, | .t3 | | $p_3 = a_{31} \times x_1$ | | 4 | fmac | .fO, | .f19, | Ο, | .t1 | | $p_4 = a_{41} \times x_1$ | | 5 | fmac | .f1, | .f20, | .t1, | .t2; | fload .f1 | $p_1 = a_{12} \times x_2 + p_1$ | | 6 | fmac | .f1, | .f21, | .t2, | .t3 | | $p_2 = a_{22} \times x_2 + p_2$ | | 7 | fmac | .f1, | .f22, | .t3, | .t1 | | $p_3 = a_{32} \times x_2 + p_3$ | | 8 | fmac | .f1, | .f23, | .t1, | .t2 | | $p_4 = a_{42} \times x_2 + p_4$ | | 9 | fmac | .f2, | .f24, | .t2, | .t3; | fload .f2 | $p_1 = a_{13} \times x_3 + p_1$ | | 10 | fmac | .f2, | .f25, | .t3, | .t1 | | $p_2 = a_{23} \times x_3 + p_2$ | | 11 | fmac | .f2, | .f26, | .t1, | .t2 | | $p_3 = a_{33} \times x_3 + p_3$ | | 12 | fmac | .f2, | .f27, | .t2, | .t3 | | $p_4 = a_{43} \times x_3 + p_4$ | | 13 | fmac | .f3, | .f28, | .t3, | .f4; | fload .f3 | $y_1 = a_{14} \times x_3 + p_1$ | | 14 | fmac | .f3, | .f29, | .t1, | .f5 | | $y_2 = a_{24} \times x_3 + p_2$ | | 15 | fmac | .f3, | .f30, | .t2, | .f6 | | $y_3 = a_{34} \times x_3 + p_3$ | | 16 | fmac | .f3, | .f31, | .t3, | .f7 | | $y_4 = a_{44} \times x_3 + p_4$ | This example is not interruptable. Results can be read from the Z port of the WTL 3332 # DATA PATH PRELIMINARY DATA 32-BIT FLOATING POINT October 1987 # Appendix B: Programming Examples, continued #### RADIX-2 FFT In-place for algorithm for radix-2 butterflies. Two butterflies are evaluated per iteration. #### NOTATION: | Source Operands | Results | |---------------------------------|-------------------------------| | Ar, Ar# (.f4, .f20) | $A_{\rm r'}$ , $A_{\rm r}$ #' | | Ai, Ai# (.f5, .f21) | $A_{i'}$ , $A_{i}$ #' | | Br, Br# (.f3, .f19) | $B{ m r}'$ , $B{ m r} \#'$ | | $B_{i}$ , $B_{i}$ # (.f0, .f16) | Bi', $B#'$ | | Wr, Wr# (.f2, .f18) | | | Wi, Wi# (.f1, .f17) | | ## ALGORITHM: | Ar' - Ar + | $(B_r \times W_r -$ | $B_i \times W_i$ ) | |------------|---------------------|--------------------| | | | | | | $(B_r \times W_i +$ | | | | $(B_r \times W_r -$ | | | Bi' = Ai - | $(B_r \times W_i +$ | $Bi \times Wr$ ) | | Cycle # | Opcode | | | | | 1/0 | Comment | |---------|--------|-------|-------|------|------|----------------|-----------------------------------------------------------| | | fnop | | | | | ; fload .f0 | | | | fnop | | | | | ; fload .f16 | | | 1 | fmns | .f1, | .fO, | 0, | .t1 | ; fload .f1 | $-B_{i}\times W_{i}$ | | 2 | fmac | .f2, | .fO, | 0, | .t2 | ; fload .f2 | +Bi×Wr | | 3 | fmns | .f17, | .f16, | 0, | ٠t3 | ; fload .f17 | -Bi#×Wi # | | 4 | fmac | .f18, | .f16, | Ο, | .t1 | ; fload .f18 | $+B_i\#\times W_r\#$ | | 5 | fmac | .f3, | .f2, | .t1, | .f8 | ; fload .f3 | $B_{\rm r} \times W_{\rm r} - B_{\rm i} \times W_{\rm i}$ | | 6 | fmac | .f3, | .f1, | .t2, | .f9 | | $B_{\rm r} \times W_{\rm i} + B_{\rm i} \times W_{\rm r}$ | | 7 | fmac | .f19, | .f18, | ,t3, | .f24 | ; fload .f19 | $B_r\#\times W_r\# - B_i\#\times W_i\#$ | | 8 | fmac | .f19, | ,f17, | .t1, | .f25 | | $B_r\#\times W_i\# + B_i\#\times W_r\#$ | | 9 | fadd | .f4, | .f8, | ,f10 | | ; fload .f4 | Ar' | | 10 | fsub | .f4, | .f8, | .f11 | | | Br' | | 11 | fadd | .f5, | .f9, | ,f12 | | ; fload .f5 | Ai' | | 12 | fsub | .f5, | .f9, | .f13 | | | Bi' | | 13 | fadd | .f20, | .f24 | .f26 | | ; fload .f20 ' | A <sub>r</sub> #′ | | 14 | fsub | .f20, | .f24, | .f27 | | ; fload .f21 | Br#' | | 15 | fadd | .f21, | ,f25, | .f28 | | ; fload .f0 | Ai#' | | 16 | fsub | .f21, | .f25, | .f29 | | ; fload .f16 | Bi#' | Figure 73. # Ordering Information | PACKAGE TYPE | TEMPERATURE RANGE | ORDER NUMBER | |--------------|----------------------|------------------------------| | 144-Pin PGA | Tc = 0 to +85 ° C | WTL 3132-GCD-080, -100, -120 | | 168-Pin PGA | Tc = 0 to +85 ° C | WTL 3332-GCD-080, -100, -120 | | 144-Pin PGA | Tc = -55 to +125 ° C | WTL 3132-GMD-080, -100, -120 | | 168-Pin PGA | Tc = -55 to +125 ° C | WTL 3332-GMD-080, -100, -120 | # XL-Series customers should order the following: | PACKAGE TYPE | TEMPERATURE RANGE | ORDER NUMBER | |--------------|------------------------------------------------|-----------------------------| | 144-Pin PGA | Tc = 0 to +85 ° C | XL-3132-GCD-080, -100, -120 | | 144-Pin PGA | $Tc = -55 \text{ to } +125 ^{\circ} \text{ C}$ | XL-3132-GMD-080, -100, -120 | ## PRELIMINARY DATA October 1987 ## **Revision Summary** #### CONTENTS This table lists major changes since the July 1986 printing of this data sheet. | Cha | ange | Description | |-----|-------------------------------------------------------|----------------------| | 1. | Features/Description/Architecture | Revised, pages 1-3 | | 2. | Figures 1-3 | New, pages 1-3 | | 3. | Signal Description | Revised, page 4 | | 4. | WTL 3132 Block Diagram | Revised, page 5 | | 5. | WTL 3332 Block Diagram | Revised, page 6 | | 6. | Register file description (figure 6) | New, page 7 | | 7. | Multiplier/accumulator description | Revised, page 8 | | 8. | Figures 7 and 11 | New, pages 8 and 11 | | 9. | Temporary register description | Revised, page 13 | | 10. | Figure 12 | New, page 13 | | 11. | Internal Data Routing description (figures 16 and 22) | New, page 16 | | 12. | Input/output description | Revised, page 20 | | 13. | Figures 26 and 31 | New, pages 21 and 24 | | 14. | Instruction Set description (figures 33, 35, and 36) | Revised, page 26 | | 15. | System Interfacing description (figures 44-51) | Revised, page 30 | | 16. | Initialization description (figures 18 and 19) | Revised, page 35 | | 17. | Division description (figure 56) | Revised, page 38 | | 18. | IEEE Considerations description | Revised, page 41 | | 19. | Figures 18 and 19 | New, page 43 | | 20. | AC Specifications and figure 64 | Revised, page 44 | | 21. | Figure 65 | Revised, page 45 | | 22. | Figure 66 | Revised, page 46 | | 23. | Appendix A (figures 68, 69, 70, and 71) | New, page 49 | | 24. | Appendix B | Revised, page 55 | ## **SUBJECT** This table lists the component revision to which successive versions of this data sheet have referred. | Data | Date | Component Suffix | |---------------------------------|---------------|------------------| | WTL 3132/3332 Preliminary Data | July, 1986 | V1A | | XL-Series Product Status Report | 3/3/87 | V1A | | XL-Series Product Status Report | 7/8/87 | V1C | | WTL 3132/3332 Errata Sheet | October, 1987 | V1C | | WTL 3132/3332 Data Sheet | October, 1987 | V6D | ## **Physical Dimensions** | Symbol | DIMENSIONS | | | | | | |--------|--------------------------|-------------------|--|--|--|--| | | INCHES | MM | | | | | | A1* | 0.091 ± 0.010 | 1.31 ±0.25 | | | | | | A2 | 0.180 typ. | 4.57 | | | | | | A3 | 0.050 | 1.27 | | | | | | D | 1.575 sq. <u>+</u> 0.016 | 40.0 <u>+</u> .41 | | | | | | E1 | 1.400 sq. <u>+</u> 0.012 | 35.56 ±.30 | | | | | | E2 | 0.050 dia. typ. | 1.27 | | | | | | E3 | 0.018 <u>+</u> 0.002 | .46 <u>+</u> .05 | | | | | | d | 0.070 dla. typ. | 1,78 | | | | | | е | 0.100 typ. | 2.54 | | | | | \*A1 = package plus lid ## **Application Note** #### COMPATABILITY WITH WTL 1264 AND WTL 1265 There are two common WTL 1264/1265 configurations. In the first, one WTL 1264 is used per each WTL 1265. In the second, two WTL 1264 devices are used for each WTL 1265. We will discuss the 1:1 configuration. Contact WEITEK for a description of the 2:1 configuration. Several programming models are used in this first configuration. As discussed in COMPATABILITY MODE, the most common programming model for the WTL 1264/1265 is maximum 64-bit throughput. If the compatability mode bit, M12, is set to "0" the WTL 2264/2265 can operate in the WTL 1264/1265 programming mode. The effect of M12 is different for the WTL 2264 and WTL 2265. If M12 is "0" in the WTL 2265, M5 can be used to control the WTL 2265's dummy pipes in the same way it controls Pipe 3 in the WTL 1265. In the WTL 2264, M<sub>12</sub> set to "0" causes PAC to be 01, ACC to be 01 and M13 to be "1". The following tables illustrate the WTL 2264/2265 mode configuration for several other WTL 1264/1265 timing models. | | Т | ABLE 18: | WTL1264/WT | L2264 | COMPATABILI | TY CHART | | | |--------------------------|------------------------------------|---------------------------|----------------------|---------------------------------------------|----------------------------------|--------------------------|---------------------------|----------------------| | | WTL 1264 | MODE SE | TTINGS | COMPATIBLE WTL 2264-50 and -60 MODE SETTING | | | | | | Operation | M 5-4<br>Pipeline<br>Configuration | M 11-8<br>Pipe<br>Advance | M 7-6<br>Accumulator | M 12 | M 5, 4, 1 Pipeline Configuration | M 9–8<br>Pipe<br>Advance | M 13<br>Pipe 2<br>Advance | M 7-6<br>Accumulator | | 64-bit max<br>throughput | 10 | 0100 | 01 | 0 | 111* | XX | Х | xx | | 32-bit max<br>throughput | 11 | 0010 | 01 | 1 | 111 | 01 | 0 | 01 | | 64-bit<br>min latency | 00 | 0000 | 01 | 1 | 100 | 01 | Х | 01 | | 32-bit<br>min latency | 00 | 0000 | 01 | 1 | 100 | 00 | Х | 01 | <sup>\*</sup>Setting M 12 to 0 causes M1 = 1, M9-8 = 01, M 13 = 1, M7-6 = 01. ## PRELIMINARY DATA July 1986 # **Physical Dimensions** | Symbol | DIMENSIONS | | |--------|--------------------------|------------------------| | | INCHES | ММ | | A1 | 0.080 ± 0.008 | 2.03 <u>+</u> 0.20 | | A2 | 0.180 typ. | 4.57 typ. | | A3 | 0.050 | 1.27 | | D | 1.575 sq. <u>+</u> 0.016 | 40.0 sq. <u>+</u> 0.41 | | E1 | 1.400 sq. <u>+</u> 0.012 | 35.56 sq. ± 0.30 | | E2 | 0.050 dia. typ. | 1.27 dia. typ. | | E3 | 0.018 <u>+</u> 0.002 | .46 <u>+</u> 0.05 | | d | 0.070 dia. typ. | 1.78 dla. typ. | | е | 0.100 typ. | 2.54 typ. | ## WTL 3332 168-PIN PIN GRID ARRAY | Symbol | DIMENSIONS | | |--------|--------------------------|------------------------| | | INCHES | ММ | | A1 | 0.095 ± 0.009 | 2.41 ± 0.23 | | A2 | 0.180 typ. | 4.57 typ. | | А3 | 0.050 | 1.27 | | D | 1.750 sq. <u>+</u> 0.018 | 44.5 sq. <u>+</u> 0.46 | | E1 | 1.600 sq. | 40.6 sq. | | E2 | 0.050 dia. typ. | 1.27 dia. typ. | | E3 | 0.018 <u>+</u> 0.002 | .46 ± 0.05 | | d | 0.070 dia. typ. | 1.78 dla. typ. | | е | 0.100 typ. | 2.54 typ. |