Improved throughput of Elliptic Curve Digital Signature Algorithm ( ECDSA ) processor implementation over Koblitz curve k-163 on Field Programmable Gate Array ( FPGA )

The widespread use of the Internet of things (IoT) in different aspects of an individual’s life like banking, wireless intelligent devices and smartphones has led to new security and performance challenges under restricted resources. The Elliptic Curve Digital Signature Algorithm (ECDSA) is the most suitable choice for the environments due to the smaller size of the encryption key and changeable security related parameters. However, major performance metrics such as area, power, latency and throughput are still customisable and based on the design requirements of the device. The present paper puts forward an enhancement for the throughput performance metric by proposing a more efficient design for the hardware implementation of ECDSA. The design raised the throughput to 0.08207 Mbit/s, leading to an increase of 6.95% from the existing design. It also includes the design and implementation of the Universal Asynchronous Receiver Transmitter (UART) module. The present work is based on a 163-bit key-size over Koblitz curve k-163 and secure hash function SHA-1. A serial module for the underlying modular layer, highspeed architecture of Koblitz point addition and Koblitz point multiplication have been considered in this work, in addition to utilising the carry-save-multiplier, modular adder-subtractor and Extended Euclidean module for ECDSA protocols. All modules are designed using VHDL and implemented on the platform Virtex5 xc5vlx155t3ff1738. Signature generation requires 0.55360ms, while its validation consumes 1.10947288ms. Thus, the total time required to complete both processes is equal to 1.66ms and the maximum frequency is approximately 83.477MHZ, consuming a power of 99mW with the efficiency approaching 3.39 * 10.


Introduction
There is a dramatically increasing high demand to achieve security objectives due to the growing use of the Internet of things (IoT) in all aspects of daily life, such as commerce and bank transactions (1), (2). Embedded devices utilised in IoT face security and resource availability challenges (3), when using asymmetric encryption to satisfy the required security goals. Public key cryptography relies on a discrete logarithm problem (DLP) that is difficult to be solved (4), (5). Elliptic Curve Cryptography (ECC) was proposed to design an asymmetric cryptographic system and a mathematical model of ECC, based on the Elliptic Curve Discrete Logarithm Problem (EDLP) (6), was adopted. ECC provides the same security level delivered by traditional public key cryptosystem such as RSA (Rivest-Shamir-Aldeman), but with a smaller number of bits i.e., less consumption, smaller area and high speed in computing the encryption key (7). Elliptic Curve Digital Signature Algorithm (ECDSA), a peer of Digital Signature Algorithm (DSA), is used to generate the signature for the document and allow to verify its originality depending on the owner's private key used to sign it (8), (9). High-speed ECDSA, design and implementation are still a challenge for research studies and developers concerning the reconfigurable circuit environment such as the FPGA, which supplies this feature with flexibility and low cost. This work proposes a throughput enhancement for the hardware implementation of ECDSA over the Galois field GF (2 163 ). The proposed contribution uses partial intermediate registers to increase the throughput and design and implement UART in the signal chip.   1 shows the hierarchy of ECDSA layers (10). The first layer is called the arithmetic field layer, consisting of modular components such as addition, multiplication, division. Elliptic curve crypto-processor layer includes point addition, point doubling and point multiplication. Finally, dedicated components such as inversion and hashing exist to implement the top-most layer.

Related Work
Benselama ZA, Bencherif MA, Khorissi N and Bencherchali, MA (10) presented the implementation of ECDSA over finite field GF(2 163 ), using Virtex-5 platform. The implementation comprises SHA-1 and ECC with control. The satisfying total time of generation is approximately 1.58ms for signature generation, at maximum frequency of 207.097Mhz, with occupation slices 10838 for registers and 28998 for lookup table (LUT) slices. Validation requires 1.953ms with occupied slices 12922 for flip-flop and 54597 for LUT at maximum frequency 195.309Mhz. M Elhadjyoussef, W Benhadjyoussef and N Machhout (11) proposed an architecture, based on the Globally Asynchronous Locally Asynchronous (GALS) design methodology.
It is implemented over GF (2 163 ), using the Virtex-6 platform. The proposed system consists of SHA-1, Random Number Generation (RNG), Elliptic curve crypto-processor and memory. The time taken for signature generation and validation was 3.844ms and the power consumption was approximately 127mW. Bhanu Panjwani and Deval C Mehta (12) demonstrated a proposed ECDSA design with two implementations. The first requires 0.367ms with 11040 slices for signature generation and 0.393ms with 12846 slices for signature verification. Both processes work at a frequency operation of 100MHz. The second one takes 0.615ms with 8773 slices for signature generation and 0.672ms with 9967 slices for verification. Both implementations use the same frequency operation and Virex-5 platform. Implementation of ECDSA as proposed by Symposium I, Anissa Sghaier, Medien Zeghid and Mohsen Machhout (13) is accomplished over Xilinx Virtex5 ML50 platform. The results show that smaller parameters are a good choice for the ECDSA processor with low storage and low power. Signature generation and validation process in this work is consumed at 1.5ms at maximum frequency of 107MHZ with consumption power approximately of 105.7mW.

Mathematical background
The elliptic curve can be defined over different types of infinite numbers like real numbers (R), complex numbers (C), integer numbers (Z) and Rational numbers (Q), but in the cryptography process, the curve should be defined over the finite field. Realisation of applying the elliptic curve over a finite field requires understanding following terms:

The Group
From mathematical perspective, a group is a set of numbers with binary operation that fulfills the following features: 1. Associativity for any x, y, z ∈ F, * ( * ) = ( * ) * . 2. The unity is unique c ∈ F, such that w ∈ G,c * w = w * c = c. 3. Inverse, for each number a ∈ F, there exists a unique number (a -1 ), such that a*a1 =c, c=a -1 *a. For group F, if any x, y ∈ F have x*y=y*x, it is called abelian or commutative group; else it is called nonabelian or non-commutative group. When the group has only finite set of numbers, it is called finite group, else it is called an infinite group (14).

The Ring
The ring R is a set of numbers with two binary operations, addition and multiplication, achieving the following properties: Albelian group structure. Associativity of multiplication operation for x, y, z ∈ F, x * (y * z) = (x * y) * z. Compatibility of multiplication and addition operations. For any x, y, z ∈ K, x * (y + z) = xy + xz‚ (x + y). z = xz + yz. If the set of elements are commutative, it is called a commutative ring (15).

The Field
The field (F) is a set of numbers together with two operations (*, +) called Field, if it satisfies the following conditions: 1. (F, +) represent an abelian group with the identity number symbolised by 0. 2. (F, *) represent an abelian group with the identity number symbolised by 1. When F has a finite element within it, it is called a "Finite field". Order is the number of elements the finite field holds. Order is equal to (P m ), where p indicates the prime number (characteristic of the finite field) and (m) represents an integer with value greater than one (dimension). Such field is symbolised by F(p m ). Finite field is distinguished by its orders as shown below: 2.1. When m = 1, the finite field is called prime field and is denoted by Fp.

For m ≥ 2, finite fields are called Extension
Fields. P equal to 2 is a special representative of extension fields denoted by (F2 m ). Fig. 2 shows the Hierarchy of a finite field (16).

EEC and its arithmetic over Field GF(2 m )
This section provides brief mathematical information about an elliptic curve E. The equation of elliptic curve E over GF(2 m ) is cubic with the following general form: . Where c1 and c2 ∈ GF(2 m ) and c2. The curve is a collection of points P over coordinates (x, y). Possible operation points over the curve are point addition and point doubling. Suppose, points N and M are on the curve, where N( 1 ‚ 1 ) ≠ M( 2 ‚ 2 ) then point addition V( 3 ‚ 3 ) = N( 1 ‚ 1 ) + M( 2 ‚ 2 ) and point doubling, when M=N is as below: There are different kinds of elliptic curves based on a and b values. This paper considers the Koblitz curve, defined over 2 . It has the same form of equation (1), where 1 ∈ (0‚1) 2 = 1. The curve has an advantage depending on the algorithm similar to double and add algorithm (binary algorithm), allowed to replace point doubling by Frobenius endomorphism(∅).
Representing a point (x, y) on the coordinates refers to affine coordinates, but the inversion process is expensive in it. Thus, Lopez Dahab (LD) projective coordinates are used in this paper, where a point (X, Y, Z) represents the affine point in equation (8).
Where 1 is the parameter of the curve, see equation (1) Applied algorithms:

Polynomial addition
The addition of two polynomials ( ) and ( ) in 2 is bitwise exclusive or (XOR).

Polynomial multiplication
Lest-significant-bit (LSB) multiplier is used at the finite level to implement the multiplier. Hence, the process of multiplication of b(x) coefficients in equation (10) begins from the left b0 and the right bm-1 (19).

10.
Algorithm LSB-first multiplier Fig.3 shows the hardware implementation of LSB multiplier over 2 .

Figure 3. Least significant bit first(LSB) multiplier
Polynomial squaring Polynomial squaring is considered as a linear operation, as it is faster than multiplication. Binary representation of ( ) 2 in can be calculated by inserting zero bit between bits one after another binary representation of ( ), as shown in fig .4.

Polynomial inversion and division
If ( ) is polynomial and ( ) represents an irreducible polynomial over (2 ),since the degree of ( ) < degree of ( ), they are relatively prime. Extended Euclidean algorithm could be initiated with ( ) and ( ) and generate polynomials ( ) with degree ( ) < and t(x) with degree < − 1 and t(x); the polynomials satisfy the following condition: The loop in the binary division computes the remainder and quotient, when ( ) is divided by ( ). In line 3, the polynomial ( ) is shifted by the amount of − and uses the new values of ( ) to do xor with ( ) and obtain the remainder ( ). In line 5, the degree ( ) is checked whether it is greater than that of . If the condition is true, the quotient polynomial ( ) by amount of − and add 1 to the last bit of q(x). If the degree of ( ) is smaller than , then the ( ) is left shifted by the amount of − .
Algorithms for mode operation of ECDSA .And then assign the value of ( ) to ( ) in line 10. This procedure is repeated until the degree of ( ) is smaller than .

Signature generation (Mode 0)
The first step to operate ECDSA is to initialise all necessary parameters. Values of coefficients a and b of domain parameters specify the shape of the elliptic curve, while G represents its generating point. The signature process starts by sending the message and zero value to the ECDSA, in order to get 160-bit message digest. The message composed of 48-bits is forwarded into a secure hash function SHA-1 process. After the hashing process, the message digest is padded to get 163-bit, directly before storing it in the register. The next step is to calculate the public key through a point-multiplication process over the curve to obtain r value, which represents the first part of the signature. In a parallel manner, the inverse (K-1) is also prepared using the inversion process to calculate the second part of the signature s. Finally, the generation of signature process delivers two values r and s.

Signature validation (Mode 1)
After the ECDSA component receives the values of the parameters of signature r, s and value one to determine the mode operation type, the validation process starts by calculating the digest of the message first, in addition to the padding process. Then the receiver uses his private key to decrypt the signature. Next, the sender uses his public key to make sure that the message is coming from the origin sender. Finally, the output represents a point on the curve with coordinates (X1, Y1). The value of X1 being equal to r implies that the signature is valid and coming from the origin sender. It is not valid when X1 is not equal to r. As a result of signature validation, the process ends.  The proposed ECDSA processor consists of two major components, UART and Core_unit. UART is constructed using three sub-modules;

Open Access
UART_receiver, UART_transmitter and Clock divider. UART_receiver gets the input data (message, mode, r and s) from the pc through the serial communication port R232 and stores the values inside serial-input-parallel-output-registers (SIPO) before starting any mode operations. ECDSA in its turn takes the input value and starts either the signature or validation process, based on the mode flag value. The output result is stored in parallel_input_serial_output (PISO) registers to be ready for transmission through UART_transmitter. The clock divider is responsible to generate a continuous signal for a baud rate of 9600bit/s. The total time required for sending 48byte (r, s, m, mode byte) is 49999ns with clock period of 10ns. ECDSA architecture consists of SHA-1, signature generation and validation components.   Fig. 8 illustrates the result of the signature process. The generation process takes approximately 0.55360ms to end the task. Ready_out signal gets a value 1 to give identification for ending the process. The clock period is 10 ns and the frequency of operation is 100Mhz. Fig. 9 shows the result of the validation process. Time required for validation is approximately 1.10947289ms. Values r, s and the message should be initialised before starting to run the process. The value of validation control signal determines the validity of the signature. When it is one the signature is valid, else it is rejected. Mod_i signal is set to 1 to let the ECDSA processor work in validation mode. Table 1

Synthesis results
The proposed ECDSA processor implementation is more suitable for applications requiring lowbandwidth communication environment. It is based on three components: UART consisting of receiver, transmitter and clock-divider and cryptographic hash function SHA-1, in addition to ECDSA-Core-unit. ECDSA over Koblitz curve is based on Finite Field GF (2 163 ), using Virtex-5 xc5vlx155t-3ff1738 as a platform. The serial the component is designed to build, the underlying arithmetic finite field and Koblitz point addition and point multiplication comprise the cornerstone of ECC operations, while ECDSA operations depend on CAS_Multiplier, extended Euclidean Algorithm (EEA) and Adder-Subtractor. The execution time of signature generation and validation process is close to 1.66ms. The number of slices on it is approximately 24,132 shown in fig.10. 83.477MHZ is the maximum frequency after timing optimization process. Fig. 11 shows the implementation of the top-most component of ECDSA processor, while fig. 12 illustrates its Register Transfer Level (RTL).

Time and clock cycles
Clock cycles of signature and validation Table 2 shows the number of clock cycles for signature and validation components. It clearly illustrates that point multiplication consumes more clock cycles, compared to other components.  Fig. 10 shows the number of registers and LUT slices and its distribution required for the design of the proposed system.

Throughput and efficiency
Obtained results show that the difference between the arrival input time and output is 0.311ns. The combinational path delay is 3.763ns, while the maximum frequency m is 83.477MHz. Throughput and efficiency values of the proposed ECDSA system can be computed based on equations mentioned in (21). As signature generation requires 0.553604023ms and validation process needs 1.109472889ms, the entire ECDSA system needs 1.663072889ms or 1663072.889ns, implying that the number of clock cycles is equal to (1663073). Throughput can be calculated using equation (10). Throughput = 83·477 * 163 166307 = 0 · 081817 Mbit/s ..10 In the same context, the efficiency of the ECDSA system can be computed by applying equation (11), as Efficiency = 0·081817 24132 = 3 · 39 * 10 −6 .. 11 Table 3 and fig. 10 clearly show that time of signature generation and verification in this work, during the simulation process, is equal to 1. 66ms. When the results are compared with designs presented in (12) and (13), their time is less than the time of the proposed solution by 0.9m and 0.1ms respectively, while the difference in time between the suggested solution and designs presented in (10), (11) show that their design is greater than by 1.873ms and 2.184ms,