Modular multiplication (MM) is the main operation in cryptography algorithms such as elliptic-curve cryptography (ECC) and Rivest-Shamir-Adleman, where repeated MM is used to perform elliptic curve point multiplication and modular exponentiation, respectively. The algorithm for the proposed architecture is derived from the Chinese remainder theorem and performs MM completely within a residue number system (RNS). Moreover, a 40-channel RNS moduli-set is proposed for this architecture to benefit from the short-channel width of the RNS moduli-set. The throughput of the architecture is enhanced by pipelining and pre-computations. The proposed architecture is fabricated as an ASIC using 65-nm CMOS technology. The measurement results are obtained for energy dissipation at different voltage levels from 0.43 to 1.25V. The maximum throughput of the proposed design is 1037Mbps while operating at a frequency of 162MHz with an energy dissipation of 48nJ. The proposed architecture enables the construction of low-voltage and energy-efficient ECCs.