Chaos Project (page1)

ICMS

Page:

Finding a premultiplier matrix P

Suppose we have n images in the training set. We find P by solving the matrix equation

P * Images = Texts .... (1)

where Images is a 14,400 ´ n matrix, the j^th column of which is the j^th image from the training set, and Texts is an 1,800 ´ n matrix, the j^th column of which is the j^th text description from the target set.

Finding P poses a problem. Simple backsubstitution methods of solving a matrix equation XA=B take too long. Once found, actually using P to recognize an image requires multiplying an 1,800 ´ 14,400 matrix and a 14,400´1 matrix, which takes roughly 373,248,000,000 (=1,800 ´ 14,400²) operations.

A nice way to circumvent all this is to realize that the premultiplier P is not unique, and so all we need to do is find a suitable P which is very sparse, i.e., most of its entries are zero. A sparse P cuts down calculation time considerably and is stored as a list and not as a table, taking up much less space. A sparse P is found thus:

Images has the matrix form [I_ij] where I_ij is the i^th element of the j^th image. Similarly, Texts has the matrix form [T_ij] where T_ij is the i^th element of the j^th text description.

We choose the premultiplier P to have the following form:

This is an 1800 ´ 14,400 matrix with at most 1,800n non-zero elements (to be determined), lying along n parallel diagonals.

For each set of n unknowns P_i,1, P_i,2, ..., P_i,n lying in the i^th row of P, the matrix equation (1) gives us a system of n simultaneous linear equations in these n unknowns. For example, when i=1 we get

P_1,1I_1,1 + P_1,2I_2,1 + P_1,3I_3,1 + ... + P_1,nI_n,1=T_1,1

P_1,1I_1,2 + P_1,2I_2,2 + P_1,3I_3,2 + ... + P_1,nI_n,2=T_1,2

_{. . . . .}

P_1,1I_1,n + P_1,2I_2,n + P_1,3I_3,n + ... + P_1,nI_n,n=T_1,n

Thus in all we get 1800 sets of n simultaneous linear equations in n unknowns. Each set is solved separately to determine the entries of P. Since the form we chose for P has a large number of zero entries, our calculation time is considerably reduced.

Once P is found it is used for image recognition as described in the previous panel.