# Design Methodology for Low Error Fixed Width Adaptive Multiplier

Abida Yousuf<sup>1</sup>, Roohie Naaz Mir<sup>2</sup>, Hakim Najeeb-ud-din<sup>3</sup>

<sup>1</sup>Ph.D Student, ECE Deptt, N.I.T, Srinager, India <sup>2,3</sup>Associate Professor, CSE Deptt, N.I.T, Srinager, India

**Abstract:-** In this paper, we develop a methodology for designing lower-error and Area efficient 2'scomplement fixed-width multiplier. In these multipliers basic multiplications follow the Baugh-Wooley algorithms and have been implemented using Field Programmable Gate Array (FPGA) devices. The approach is based on the fact that the multiplication operations used in multimedia applications (such as DSP) usually have the special fixed-width property i.e., their input data and output product have the same bit width. For some practical DSP applications, we only require n-bit multiplication output, which is to be obtained by directly truncating the n least-significant bits and preserving the n most significant bits. However, significant errors are introduced in the fixed-width operation, which are undesirable for DSP applications By properly choosing the generalized index and binary thresholding, we derive a better error-compensation bias to reduce the truncation error. The proposed fixed width low error multiplier shows better error performance as compared to other existing multiplier structures.

Keywords:- Bough Wooley Algorithm, Fixed Width Adaptive Multiplier, Multiplication., FPGA

## I. INTRODUCTION

Multiplication is an important operation in many algorithms used in scientific computations such as Digital Signal Processing (DSP). The computational complexities of algorithms used in Digital Signal Processors (DSPs) have gradually increased over the years. Therefore, DSP's require fast and efficient parallel multipliers for general purpose as well as application specific architectures. In these multipliers the basic multiplication follows the Baugh Wooley multiplier. The multipliers based on the Baugh–Wooley algorithm produce 2n-bit output with n-bit multiplier and n-bit multiplicand input. The DSP applications require extensive use of multiplication and squaring functions. A full width digital n × n multiplier computes the 2n output as a weighted sum of partial products. If the product is truncated to n-bits, the least-significant columns of the product matrix contribute little to the final result. To take advantage of this, truncated multipliers do not form all of the least-significant columns in the partial-product matrix. By eliminating more columns the area and power consumption of the arithmetic unit are significantly reduced and the delay also decreases. For some practical applications, we only require n-bit multiplication output, which is to be obtained by directly truncating the n least-significant bits and preserving the n most significant bits. However, significant errors are introduced in the fixed-width operation, which are undesirable for DSP applications [1].

To reduce the introduced truncation error, [2] proposed an analytical technique to generate a correction term. The main drawback of this design is the correction bias added to offset. The error due to truncation is a constant term and does not depend on the inputs being fed to the multiplier. [3] Proposed the fixed multiplier with a constant correction technique, which introduces a degree of flexibility in the number of columns that are truncated. This gives designers a chance to choose between area savings and better error correction. However, there exist two problems 1) how to choose proper indices. 2) Whether other lower error multipliers exist or not. The work in this paper proposes the general methodology for designing the lower error 2's-complement fixed-width multiplier with  $w \ge 1$ .

The rest of the paper is organized as follows: section 2 discusses the Baugh Wooley multiplier, section 3 gives details of the proposed algorithm, section 4 presents the results and section 5 provides the conclusion and references are listed in the end.

## II. BAUGH WOOLEY MULTIPLICATION

The Baugh wooley multiplication algorithm is an efficient way to handle the sign bits. This technique has been developed to design regular multipliers, suited for 2's compliment mumbers.[1]

**LET US CONSIDER** Considering 2's complement integer operands, a n-bit multiplicand X and a n-bit multiplier Y can, respectively be represented by

$$X = -x_{n-1}2^{n-1} + \sum_{i=0}^{n-2} x_i 2^i$$
<sup>(1)</sup>

$$Y = -y_{n-1}2^{n-1} + \sum_{i=0}^{n-2} y_i 2^i$$
<sup>(2)</sup>

Where  $x_i, y_i \in \{0, 1\}$ 

The standard product P<sub>Standard</sub> can be written as

$$= = x_{n-1}y_{n-1}2^{2^{n-2}} + \sum_{i=0}^{n-2}\sum_{j=0}^{n-2}x_{i}y_{j}2^{i+j} + 2^{n-1}\left(-2^{n-1} + \sum_{i=0}^{n-2}\overline{x_{n-1}y_{i}}2^{j} + 1\right) + 2^{n-1}\left(-2^{n-1} + \sum_{j=0}^{n-2}\overline{y_{n-1}x_{j}}2^{j} + 1\right)$$
(3)

#### (A) Fixed Width Multiplication

The multiplication based on the Baugh wooley algorithm produce 2n bit output with n bit multiplier and n bit multiplicand input. However in DSP applications only n bit multiplication output is needed. Therefore the fixed width multiplier is obtained by truncating the least significant partial products. And preserving the most significant partial products as shown in figure 1.



**Fig.1:** Partial product array diagram for an  $n \times n$  Baugh-Wooley multiplier.

The most accurate **fixed width** product is theoretically given as

$$P_{S \tan dard} \cong MP + \sigma_{temp} \times 2^{n}$$
Where
$$\sigma_{temp} = \left[\frac{LP}{2^{n}}\right] \tag{4}$$

$$P_{S \tan dard} = \left[\frac{1}{2} \left(x_{n-1} y_0 + \dots + x_0 y_{n-1}\right) + \frac{1}{2^2} \left(x_{n-2} y_0 + \dots + x_0 y_{n-2}\right) + \dots + \frac{1}{2^{n-1}} \left(x_1 y_0 + x_0 y_1\right) + \frac{1}{2^n} \left(x_0 y_0\right)\right]$$
(5)

Where  $\sigma_{temp}$  an ideal error-compensation term called true rounding approach and it is infeasible to implement the truncated fixed-width multiplier without using any acceptable approximation. From equation (5),

it is observed that  $\sigma_{\text{temp}}$  is mainly affected by  $\frac{1}{2}(\overline{x_{n-1}y_0} + \overline{x_{n-2}y_1} + ... + \overline{x_0y_{n-1}})$  due to the largest weight. Now, let us assume the main error compensation term  $E_{\text{main}}$  and remaining error compensation term  $E_{\text{remain}}$  [2, 3]. Therefore,

$$E_{main} = \frac{1}{2} \left( \overline{x_{n-1} y_0} + \overline{x_{n-2} y_1} + \dots + \overline{x_0 y_{n-1}} \right)$$
(6)

And

$$E_{remain} = \frac{1}{2^{1}} \left( \overline{x_{n-2} y_{0}} + \dots + \overline{x_{0} y_{n-2}} \right) + \dots + \frac{1}{2^{n-2}} \left( \overline{x_{1} y_{0}} + \dots + \overline{x_{0} y_{1}} \right) + \frac{1}{2^{n-1}} \left( \overline{x_{0} y_{0}} \right)$$
(7)

The equation (2) can be rewritten as

$$\sigma_{temp} = \left[\frac{1}{2} \left(E_{main} + E_{remain}\right)\right] \tag{8}$$

Note that  $\sigma_{\text{temp}}$  varies as the input bits  $x_i$ 's or  $y_i$ 's alternates.

Next we first define a generalized index,  $\theta_{index,w}$ , where w means to keep n+w most significant columns of the sub-product array as shown in Figure 1 [4], and the binary parameters  $(q_{n-1-w}, q_{n-2-w}, ..., q_0) \in \{0, 1\}$ .

$$\theta_{index,w}\left(q_{n-1-w}, q_{n-2-w}, \dots, q_{0}\right) = \left(x_{n-1-w} y_{0}\right)^{q_{n-1-w}} + \left(x_{n-2-w} y_{0}\right)^{q_{n-2-w}} + \dots + \left(x_{0} y_{n-1-w}\right)^{q_{0-w}}$$
(9)

The operator

$$\left\langle x_i y_j \right\rangle^{q_i} = x_i y_j \qquad if \qquad q_i = 0 \\ = \overline{x_i y_j} \qquad if \qquad q_i = 1$$

(10)

To introduce the generalized index into the error compensation bias equation we rewrite equation 8 as

$$\sigma_{temp} = \theta_{Q,w} + \left\lfloor \frac{1}{2} \left( E_{main} + E_{remain} \right) - \theta_{Q,w} \right\rfloor$$
(11)

where index

$$Q = \left(q_{n-1-w} \times 2^{n-1-w} q_{n-2-w} \times 2^{n-2-w} \times \dots \times q_0 2^0\right)$$
  
Where Q has a range varying from 0 to  $2^{n-1}+1$ .

III. PROPOSED FIXED WIDTH MULTIPLIERS WITH  $W \ge 1$ 

The lower truncation error can be obtained if larger most significant columns are kept in hardware, however at the cost of area. Equation (8) can be rewritten as

$$\sigma_{temp} = \left\lfloor \left\langle x_{n-w-2} y_1 \right\rangle^{q_{n-2-w}} + \dots + \left\langle x_1 y_{n-w-2} \right\rangle^{q_1} + \frac{\left[K\right]}{2^w} \right\rfloor$$
(12)

Where

$$K = \left\langle x_{n-w-1} y_0 \right\rangle^{q_{n-w-1}} + \left\langle x_0 y_{n-w-1} \right\rangle^{q_0} + \left[ \frac{1}{2} E_{main} + \frac{1}{2} E_{remain} + \theta_{Q,w} \right]$$
(13)

In equation (12), the first term in the bracket is referred to as coarse-adjustment term and the second term [K] is referred to as fine-adjustment term. The coarse adjustment term can be easily realized by a simple circuit using AND, OR logic, while the index is decided. On the other hand, the value of the fine-adjustment term can be obtained by the expected value in rounding operation after analyzing the statistics [5].

For designing simple and realizable error-compensation circuit, we define two types of binary thresholds for bias estimation. Both types of binary thresholding of  $\theta_{index}$  are described as follows:

Type 1

$$\sigma_{tempQ,w>1} = \langle x_{n-2} y_1 \rangle^{q_{n-2}} + \dots + \langle x_1 y_{n-2} \rangle^{q_1} + \frac{[K_1]}{2^w} \quad if \ \theta_{Q,w} = 0$$
$$= \langle x_{n-2} y_1 \rangle^{q_{n-2}} + \dots + \langle x_1 y_{n-2} \rangle^{q_1} + \frac{[K_2]}{2^w} \quad if \ \theta_{Q,w} > 0 \tag{14}$$

Type 2

$$\sigma_{tempQ,w=1} = \langle x_{n-w-2} y_1 \rangle^{q_{n-w-2}} + \dots + \langle x_1 y_{n-2} \rangle^{q_1} + \frac{\lfloor K_3 \rfloor}{2^w} \qquad if \ \theta_{Q,w} > n$$
$$= \langle x_{n-w-2} y_1 \rangle^{q_{n-w-2}} + \dots + \langle x_1 y_{n-2} \rangle^{q_1} + \frac{\lfloor K_4 \rfloor}{2^w} \qquad if \ \theta_{Q,w} = n \qquad (15)$$

Where  $K_1$ ,  $K_2$ ,  $K_3$ ,  $K_4$  are average values of K for satisfying  $\theta_{index} = 0$ ,  $\theta_{index} > 0$ ,  $\theta_{index} > n$ ,  $\theta_{index} = n$  respectively. The restriction on the value of K can be limited as  $[K_i] \in \{0, 1, 2^{w-1} - 1, 2^{w-1}\}$  for i = 1, 2, 3 and 4.[6, 7].

By doing simulations we obtained the values K at different generalized indexes as shown in figure 2. we simulate the value of K for smaller word length while for large word lengths the value of K is determined by statistical analysis because at large word lengths we are not able to simulate the value of K due to high computational load.



Fig. 2: Values of K versus different values Q of the Binary thresholding for n=6

(A)Statistical Analysis:

For bias estimation, we assume Type 2 binary thresholding, which is defined in equation (15). After analyzing equation (15), two cases can be taken into consideration.

#### CASE 1

 $\theta_{2^{n-1}+1} < n$ 

From (13), we have

$$E\left\{\frac{1}{2}E_{main}\right\} = \frac{1}{2}\left(\frac{3}{4} + \frac{3}{4} + \frac{1}{4} \times (n-2)\right) \cong \frac{n}{8} + \frac{1}{2}$$
  
Where  
$$E\left(x_{i}y_{j}\right) = \frac{1}{4} \text{ and}$$
  
$$E\left(\overline{x_{i}y_{j}}\right) = \frac{3}{4}$$
  
Further  
$$E\left\{\frac{1}{2}E_{remain}\right\} = \frac{1}{4}\left(\frac{1}{2^{2}}(n-1) + \frac{1}{2^{3}}(n-2) + \dots + \frac{1}{2^{n}} + 1\right) = \frac{n}{8} - \frac{1}{4}$$

The generalized index is

$$\theta_{Q=2^{n-1}+1} = x_{n-2}y_1 + \overline{x_{n-2}y_1} + \sum_{\substack{i+j=n-1\\i,j\neq n-1}} x_i y_j + E_{main} \\
[K_3] = \left[ E\{K\} \right] \\
= \left[ E\left\{ \overline{x_{n-2}y_1} + \overline{x_1y_{n-2}} - \frac{1}{2}E_{main} + \frac{1}{2}E_{remain} \right\} \right] \\
= \left[ \frac{3}{4} + \frac{3}{4} - \frac{n}{8} - \frac{1}{2} + \frac{n}{8} - \frac{1}{4} \right] = 1$$
(16)

## CASE 2

 $\overline{\theta_{2^{n-1}+1}} = n$ 

This condition is met when

$$\overline{x_{n-2}y_1} + \overline{x_1y_{n-2}} = 1$$
  
and  
$$(x_{n-2}y_1 = \dots = x_1y_{n-2} = 1)$$
  
Also  
$$E\left\{\frac{1}{2}E_{remain}\right\} = \frac{1}{2^2}\left(\frac{1}{3} \times 1 \times 2(n-3)\right) + \frac{1}{2^3}\left(\frac{1}{3} \times 1 \times 2(n-4)\right) + \dots + \frac{1}{2^{n-1}}\left(\frac{1}{3} \times 1 \times 2\right) + \frac{1}{2^n}\left(\frac{1}{9} \times 1 \times 1\right)$$
$$\cong \frac{1}{2}n - \frac{5}{3} \qquad if n \ge 4$$

Therefore,

$$[K_4] = [E\{K\}] = \left[E\{\overline{x_{n-2}y_1} + \overline{x_1y_{n-2}} - \frac{1}{2}E_{main} + \frac{1}{2}E_{remain}\}\right] = 0$$
(17)

Thus from equations (16), (17) and (15), we can derive a new error compensation bias as.

$$\sigma_{tempQ,w=1} = \left\langle x_{n-w-2} \, y_1 \right\rangle^{q_{n-w-2}} + \ldots + \left\langle x_1 \, y_{n-w-2} \right\rangle^{q_1} + \frac{1}{2^w} \quad \text{if } \ \theta_{2^{n-1}+1} < n$$

$$= \left\langle x_{n-w-2} y_{1} \right\rangle^{q_{n-w-2}} + \dots + \left\langle x_{1} y_{n-w-2} \right\rangle^{q_{1}} + 0 \quad if \quad \theta_{2^{n-1}+1} = n$$
(18)

×2

×1

Therefore, this constant approximation for  $K_3$  can be mapped to the structure as shown in figure 3(a) for n=8 [8], where A, ND, HA, and FA denote AND gate, NAND gate, a half adder and a full adder, respectively. The logic diagrams of AOR, ANOR, AHA, AFA, and NFA is shown in figure 3(b)

×З

<u>64</u>

x5

×7

хđ



(b)

Fig.3: (a) Proposed low-error fixed-width 8 x 8 Baugh Wooley multiplier with  $\theta_{Q=2^{n-1}+1,w=1}$ , and (b) Logical Elements.

### **IV. RESULTS**

The figure 4 shows the comparison between Booth and Baugh Wooley multiplication technique in terms of delay and we conclude that Baugh Wooley algorithm is the efficient one. The table1 shows the error performance of different fixed width multipliers. The Table 2 gives the comparison of standard and fixed width multiplier in terms of number of occupied slices and delay.



Fig. 4: Comparison between Booth and Baugh Wooley algorithms in terms of delay.

| Multiplier                      | Width | Max<br>Error | Mean Square<br>Error |
|---------------------------------|-------|--------------|----------------------|
| Fixed width multiplier $w = 0$  | N = 8 | -0.1156      | 3.3061 x 10-<br>5    |
| Fixed width<br>multiplier w = 1 | N = 8 | -0.0078      | 8.2652 x 10-<br>6    |

**Table 1**: Comparison results of error among different fixed width Baugh Wooley multipliers.

Table 2: Comparison results of Area and delay among different Baugh Wooley multipliers n=8

| Multiplier  | No. of occupied LUT slices | Delay(ns) | No. of<br>IOB |
|-------------|----------------------------|-----------|---------------|
| Standard    | 86                         | 10.878    | 32            |
| Fixed Width | 54                         | 8.06      | 24            |

## **IV. CONCLUSIONS**

By properly choosing the generalized index and binary thresholding, we derive a better errorcompensation bias to reduce the truncation error and then construct a lower error fixed-width multiplier, which is area-efficient for VLSI realization. Moreover, a number of low error fixed width multipliers are generated, the only constraint is to choose the right value of the index which would need exhaustive search.

### REFERENCES

- [1]. C. R. Baugh and B. A. Wooley, "A two's complement parallel array multiplication algorithm," *IEEE Transaction on Computers*, vol. C-22, no. 12, pp.1045-1047, December 1973.
- [2]. S. S. Kidambi, F. El-Guibaly, and A. Antoniou, "Area efficient multiplier for digital signal processing applications," *IEEE Transaction on Circuit, Systems II*, vol. 43, pp. 90-94, 1996.
- [3]. E. E. Swartzlander, Jr, "Truncated multiplication with approximate rounding," *Proceedings* 33<sup>rd</sup> *Conference on Signals, Systems, and Computer,* vol. 2, pp. 1480- 1483, 1999.
- [4]. Tu and Van, "Power efficient pipelined reconfigurable fixed width Baugh Wooley multiplier," *IEEE Transaction on Computers*, vol. 58, no. 10, October, 2009.
- [5]. Min-An Song, Lan-Da Van, Ting-Chun Huang, and Sy-Yen Kuo, "A Generalized Methodology for Low-Error Fixed-Width Booth Multipliers," *IEEE 47<sup>th</sup> Symposium on Circuits and Systems*, July, 2004.
- [6]. Jer Min Joa, Shiann Rong Kuang, and Ren Der Chen, "Design of low error fixed width multipliers for DSP applications," *IEEE Transaction on Circuits and Systems-II*: vol. 46, no. 6, June, 1999.
- [7]. Lan Da Van, and Chih Chyau Yang, "Generalized Low Error Area- Efficient Fixed Width Multipliers," *IEEE Trans On Circuits And Systems*, vol.52, no. 8, August, 2005.
- [8]. Muhammad H. Rais, B.M. Al-Harthi, Saad I. Al-Askar and F. K. Al. Hussein, "Design and Implementation of Basic Building Blocks of Power-Efficient Baugh Wooley Multipliers," *American Journal. Engineering and Applied Science*, pp. 307-311, 2010.
- [9]. Nicola Petra, Davide De Caro, Valeria Garofalo, Ettore Napoli, and Antonio G. M. Strollo, " Truncated Binary Multipliers With Variable Correction and Minimum Mean Square Error.", "*IEEE Transaction on Circuits and Systems-II*: vol. 57, no. 6, June, 2010.
- [10]. Y. C. Liao, H. C. Chang, and C. W. Liu, "Carry estimation for two's complement fixed-width multipliers," in *Workshop on Signal Process. Syst. (SiPS)*, Oct. 2006, pp. 345–350.