The fastest multiplication algorithm.
O(n), fully parallel, pure addition and subtraction.
This pseudocode is just a runnable demonstration of the full algorithm, with comments added for explanation.
GitHub link (idk why people still use this)
Requirements: python, pip → duckdb, & pandas
import duckdb
'''
This is based in modular arithmetic based multiplication of 11.
By multiplying a number by 11 it functions to transmute a number into a "multiplicative" base of 1.1 purely through addition.
There is only one serial operation, which is the "serial_digit_sequence" function, which in hardware would be
executed during digit-read in, along with the assembly of precomputation blocks.
Example, we multiply two numbers 'm' and 'n_prime'
One must be transmuted into the multiplicative base and the other is the additive component.
Number 'm' is the base to determine additions, n_prime is transmuted. The transmuted number is just 'n'.
m is used to determine order of magnitude and number of additions of n_prime and it's transmuted 'n'
for each digit of m (from left to right) add the respective number of transmuted digit n.
if the first number is "3" then add 3 n, and multiply by 10e(length of the digit-2)
do this for each place and sum all digits
NOTE: because the 1.1 version of n overflows one space a correction must be made if the next digit has excess
additions from the previous digit spilling over, this is subtracted using n_prime.
n = n_prime * 11 = n_prime + 10 + n_prime, therefore subtracting n_prime clears the overflow to the next digit
Since each digit can have values between 0 and 9 the summation and subtraction values can be precomputed ahead
of time and the sequence can be used to mark the precomputed value that will be used, positive or negative.
NOTE: Precomputation can also be done during read-in of n_prime before multiplication ever starts, since it's
all a per-place based multiplication for digits 0-9
The end result is a multiplication that scales in linear time, pure addition based on the size of the input, with
computation starting before the digits are fully read in.
For 8 digit by 8 digit multiplication the operation is reduced to maximum 8 additions/subtractions
Accounting for precomputation there are 25 addition/subtraction operations in total
For traditional methods of 8 digit by 8 digit multiplication is approximately 64 multiplications, w/o addition.
More importantly, this is a serial operation.
After the sequence is determined at read-in the entire operation is addition and subtraction and
can be fully parallelized with little overhead. O(n) that can be scaled near infinitely.
'''
def serial_digit_sequence(m):
digits = [int(d) for d in str(m)]
result = [digits[0]]
counter = digits[0]
for d in digits[1:]: #skip first digit
if counter > 0:
counter = d - counter
result.append(counter)
else: #when the counter is negative or 0 we just append the current digit and move on
result.append(d)
counter = d #reset counter to current digit
return result
def precompute_add_blocks(n_prime):
n_prime = int(n_prime)
n = n_prime*10
n = n + n_prime
precomp1 = n
precomp2 = n*(2) #n+n
precomp3 = n*(3) #n+n+n
precomp4 = n*(4) #etc...
precomp5 = n*(5)
precomp6 = n*(6)
precomp7 = n*(7)
precomp8 = n*(8)
precomp9 = n*(9)
ADD_BLOCK = {"0": 0, "1": precomp1, "2": precomp2, "3": precomp3, "4": precomp4, "5": precomp5, "6": precomp6, "7": precomp7, "8": precomp8, "9": precomp9}
return(ADD_BLOCK)
def precompute_sub_blocks(n_prime):
n_prime = int(n_prime)
n = n_prime
precomp1 = n*(-1)
precomp2 = n*(-2)
precomp3 = n*(-3)
precomp4 = n*(-4)
precomp5 = n*(-5)
precomp6 = n*(-6)
precomp7 = n*(-7)
precomp8 = n*(-8)
precomp9 = n*(-9)
SUB_BLOCK = {"0": 0, "-1": precomp1, "-2": precomp2, "-3": precomp3, "-4": precomp4, "-5": precomp5, "-6": precomp6, "-7": precomp7, "-8": precomp8, "-9": precomp9}
return(SUB_BLOCK)
def algorithm(sequence,ADD_BLOCK,SUB_BLOCK,n_prime):
placement = len(sequence)-2 #accounts for ignoring first and last digit
*rest, last = sequence
con.execute("INSERT INTO station_eleven VALUES (?)", [last*int(n_prime)])
for i in rest:
if i >= 0:
positive_value = ADD_BLOCK.get(str(int(i)))
con.execute("INSERT INTO station_eleven VALUES (?)", [positive_value * (10 ** placement)])
elif i < 0:
negative_value = SUB_BLOCK.get(str(int(i)))
con.execute("INSERT INTO station_eleven VALUES (?)", [negative_value * (10 ** (placement+1))])
placement = placement-1
if __name__ == "__main__":
#input m and n_prime as two numbers for multiplication
m = 13249381
n_prime = 31281459
#database for storage and simulation of parallel addition
con = duckdb.connect()
con.execute("""
CREATE TABLE station_eleven(
value HUGEINT)""")
con.execute("TRUNCATE TABLE station_eleven")#clear everything
print("starting...")
#fill the precompute blocks, then start adding based on the sequence
algorithm(serial_digit_sequence(m), precompute_add_blocks(n_prime), precompute_sub_blocks(n_prime),n_prime)
print(con.execute("SELECT * FROM station_eleven").fetchdf()) #display database table at the end
result = con.execute("SELECT SUM(value) FROM station_eleven").fetchone()[0]
print("algorithm result: ",result)
print("should match: ", m*n_prime)
'''
Title of Invention: O(n) Multiplication by 11
APPLICATION #64/081,374
This algorithm was invented by Wesley James Hudgens, at age 29.
'''To get a full understanding of its usefulness, examine it as applied to matrix multiplication…
#Zero mulitiplication dot-products
#Consider multiplication of two matrices A and B, where B is a reused weight matrix:
A = [[13, 24], [12, 31]]
B = [[31, 12], [21, 13]] #these would be the 'n_prime' in the base algorithm
precomputed additions for 31: {'0': 0, '1': 341, '2': 682, '3': 1023, '4': 1364, '5': 1705, '6': 2046, '7': 2387, '8': 2728, '9': 3069}
precomputed additions for 12: {'0': 0, '1': 132, '2': 264, '3': 396, '4': 528, '5': 660, '6': 792, '7': 924, '8': 1056, '9': 1188}
precomputed additions for 21: {'0': 0, '1': 231, '2': 462, '3': 693, '4': 924, '5': 1155, '6': 1386, '7': 1617, '8': 1848, '9': 2079}
precomputed additions for 13: {'0': 0, '1': 143, '2': 286, '3': 429, '4': 572, '5': 715, '6': 858, '7': 1001, '8': 1144, '9': 1287}
precomputed subtractions for 13: {'0': 0, '-1': -13, '-2': -26, '-3': -39, '-4': -52, '-5': -65, '-6': -78, '-7': -91, '-8': -104, '-9': -117}
precomputed subtractions for 21: {'0': 0, '-1': -21, '-2': -42, '-3': -63, '-4': -84, '-5': -105, '-6': -126, '-7': -147, '-8': -168, '-9': -189}
precomputed subtractions for 12: {'0': 0, '-1': -12, '-2': -24, '-3': -36, '-4': -48, '-5': -60, '-6': -72, '-7': -84, '-8': -96, '-9': -108}
precomputed subtractions for 31: {'0': 0, '-1': -31, '-2': -62, '-3': -93, '-4': -124, '-5': -155, '-6': -186, '-7': -217, '-8': -248, '-9': -279}
#Example: Row1*Col1 = (13*12)+(24*21).
#For 13*12 it becomes a sequencing of 13 [1,2] and a lookup for the additions of 24
#Sum 132 and 24 for 156.
#For 24*21 the sequencing is [2,2] and a lookup for 462 + 42 for 504
#Combine 156 and 504 for a dot product of 660.
#Equivalent to (13×12) + (24×21) = 156 + 504 = 660
#Example: Row2*Col2 = (31*12)+(12*21).
#Sequence 31 for [3,-2]
#look up multiplications of 12: 396
#look up subtractions of 12: -24
#Sum 396 and -24 for an answer of 372
#Sequence 12 for [1,1]
#look up multiplications for 21: 231
#last digit in sequence is untransformed: 21
#Sum 231 and 21 for an answer of 252
#Combine 252 and 372 for a dot product of 624
#The remaining dot products are left as an exercise for the reader.
#These precomputed blocks can be re-used for every single multiplication.
#For a 8192×8192 matrix with a 4096-token batch, each column's blocks are reused 4096 times.
#I look forward to the hardware implementations of this algorithm.Title of Invention: O(n) Multiplication by 11
US Patent ID - #64/081,374
This algorithm has been patented is available for sale or licensing to hardware vendors and chip manufacturers. This algorithm is to remain barred from the Nvidia Corporation or Nvidia Corporation subsidiaries in perpetuity.
