The implementation of Montgomery multiplication improved the performance of public key cryptography (e.g. CSIDH, ECC, and RSA) implementations on ARM64 architectures, directly. Last but not least, the performance of Karatsuba algorithm does not ensure the fastest speed record, while it is determined by the clock cycles per multiplication instruction of target ARM architectures. In particular, recent Apple processors based on ARM64 architecture show lower cycles per instruction of multiplication than that of ARM Cortex-A series. For this reason, the schoolbook method shows much better performance than the sophisticated Karatsuba algorithm on Apple processors. With this observation, we can determine the proper approach for multiplication of cryptography library (e.g. MS-SIDH) on Apple processors and ARM Cortex-A processors.
Go to Source of this post
Author Of this post: