ICMS

Page: 1 2 3

Finding a premultiplier matrix P

Suppose we have n images in the training set. We find P by solving the matrix equation

P * Images = Texts .... (1)

where Images is a 14,400 ´ n matrix, the jth column of which is the jth image from the training set, and Texts is an 1,800 ´ n matrix, the jth column of which is the jth text description from the target set.

Finding P poses a problem. Simple backsubstitution methods of solving a matrix equation XA=B take too long. Once found, actually using P to recognize an image requires multiplying an 1,800 ´ 14,400 matrix and a 14,400´1 matrix, which takes roughly 373,248,000,000 (=1,800 ´ 14,4002) operations.

A nice way to circumvent all this is to realize that the premultiplier P is not unique, and so all we need to do is find a suitable P which is very sparse, i.e., most of its entries are zero. A sparse P cuts down calculation time considerably and is stored as a list and not as a table, taking up much less space. A sparse P is found thus:

Images has the matrix form [Iij] where Iij is the ith element of the jth image. Similarly, Texts has the matrix form [Tij] where Tij is the ith element of the jth text description.

We choose the premultiplier P to have the following form:

This is an 1800 ´ 14,400 matrix with at most 1,800n non-zero elements (to be determined), lying along n parallel diagonals.

For each set of n unknowns Pi,1, Pi,2, ..., Pi,n lying in the ith row of P, the matrix equation (1) gives us a system of n simultaneous linear equations in these n unknowns. For example, when i=1 we get

P1,1I1,1 + P1,2I2,1 + P1,3I3,1 + ... + P1,nIn,1=T1,1

P1,1I1,2 + P1,2I2,2 + P1,3I3,2 + ... + P1,nIn,2=T1,2

. . . . .

. . . . .

. . . . .

P1,1I1,n + P1,2I2,n + P1,3I3,n + ... + P1,nIn,n=T1,n

Thus in all we get 1800 sets of n simultaneous linear equations in n unknowns. Each set is solved separately to determine the entries of P. Since the form we chose for P has a large number of zero entries, our calculation time is considerably reduced.

Once P is found it is used for image recognition as described in the previous panel.


Go to the top

Copyright © 2022 ICICI Centre for Mathematical Sciences
All rights reserved. Send us your suggestions at