|
|
data:image/s3,"s3://crabby-images/11076/11076b391792e4d0eedf0c899f492127d78afc94" alt="ICMS"
Finding a premultiplier matrix P
Suppose we have n images in the training set. We find P by solving the matrix equation
P * Images = Texts .... (1)
where Images is a 14,400 ´ n matrix, the jth column of which is the jth image from the training set, and Texts is an 1,800 ´ n matrix, the jth column of which is the jth text description from the target set.
Finding P poses a problem. Simple backsubstitution methods of solving a matrix equation XA=B take too long. Once found, actually using P to recognize an image requires multiplying an 1,800 ´ 14,400 matrix and a 14,400´1 matrix, which takes roughly 373,248,000,000 (=1,800 ´ 14,4002) operations.
A nice way to circumvent all this is to realize that the premultiplier P is not unique, and so all we need to do is find a suitable P which is very sparse, i.e., most of its entries are zero. A sparse P cuts down calculation time considerably and is stored as a list and not as a table, taking up much less space. A sparse P is found thus:
Images has the matrix form [Iij] where Iij is the ith element of the jth image. Similarly, Texts has the matrix form [Tij] where Tij is the ith element of the jth text description.
We choose the premultiplier P to have the following form:
data:image/s3,"s3://crabby-images/b5dff/b5dffaccf1870a148e19f9813859bcbe1f43c666" alt=""
This is an 1800 ´ 14,400 matrix with at most 1,800n non-zero elements (to be determined), lying along n parallel diagonals.
For each set of n unknowns Pi,1, Pi,2, ..., Pi,n lying in the ith row of P, the matrix equation (1) gives us a system of n simultaneous linear equations in these n unknowns. For example, when i=1 we get
P1,1I1,1 + P1,2I2,1 + P1,3I3,1 + ... + P1,nIn,1=T1,1
P1,1I1,2 + P1,2I2,2 + P1,3I3,2 + ... + P1,nIn,2=T1,2
. . . . .
. . . . .
. . . . .
P1,1I1,n + P1,2I2,n + P1,3I3,n + ... + P1,nIn,n=T1,n
Thus in all we get 1800 sets of n simultaneous linear equations in n unknowns. Each set is solved separately to determine the entries of P. Since the form we chose for P has a large number of zero entries, our calculation time is considerably reduced.
Once P is found it is used for image recognition as described in the previous panel.
Copyright © 2022 ICICI Centre for Mathematical Sciences
All rights reserved. Send us your suggestions at
|