Wednesday, April 13, 2016

vectorized programming languages "loop based" programming like C

Contact The DL Team Contact Us | Switch to single page view (no tabs)
An algorithm is described to solve multiple-phase optimal control problems using a recently developed numerical method called theGauss pseudospectral method. The algorithm is well suited for use in modern vectorized programming languages such as FORTRAN 95 and MATLAB. The algorithm discretizes the cost functional and the differential-algebraic equations in each phase of the optimal control problem. The phases are then connected using linkage conditions on the state and time. A large-scale nonlinear programming problem (NLP) arises from the discretization and the significant features of the NLP are described in detail. A particular reusable MATLAB implementation of the algorithm, called GPOPS, is applied to three classical optimal control problems to demonstrate its utility. The algorithm described in this article will provide researchers and engineers a useful software tool and a reference when it is desired to implement the Gauss pseudospectral method in other programming languages.


or more pieces of data. Modern x86 chips have the SSE instructions, many PPC chips have the "Altivec" instructions, and even some ARM chips have a vector instruction set, called NEON.
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
(I chose 4 because it's what modern hardware is most likely to directly support; the term "vectorization" is also used to describe a higher level software transformation where you might just abstract away the loop altogether and just describe operating on arrays instead of the elements that comprise them)

The difference between vectorization and loop unrolling: Consider the following very simple loop that adds the elements of two arrays and stores the results to a third array.
for (int i=0; i<16; ++i)
    C[i] = A[i] + B[i];
Unrolling this loop would transform it into something like this:
for (int i=0; i<16; i+=4) {
    C[i]   = A[i]   + B[i];
    C[i+1] = A[i+1] + B[i+1];
    C[i+2] = A[i+2] + B[i+2];
    C[i+3] = A[i+3] + B[i+3];
}
Vectorizing it, on the other hand, produces something like this:
for (int i=0; i<16; i+=4)
    addFourThingsAtOnceAndStoreResult(&C[i], &A[i], &B[i]);
Where "addFourThingsAtOnceAndStoreResult" is a placeholder for whatever intrinsic(s) your compiler uses to specify vector instructions. Note that some compilers are able to auto vectorize very simple loops like this, which can often be enabled via a compile option. More complex algorithms still require help from the programmer to generate good vector code.
shareedit
3 
What's the difference between this and loop unwinding/unrolling? – Jeremy Powell Sep 14 '09 at 16:28
   
Isn't it true that a compiler would have an easier job auto-vectorizing the unrolled loop ? – Nikos AthanasiouJun 10 '15 at 23:35
   
@NikosAthanasiou: It's plausible, but generally speaking a compiler should be able to autovectorize either loop, as they are both quite simple. – Stephen Canon Jun 12 '15 at 2:43
Vectorization is the term for converting a scalar program to a vector program. Vectorized programs can run multiple operations from a single instruction, whereas scalar can only operate on pairs of operands at once.
From wikipedia:
Scalar approach:
for (i = 0; i < 1024; i++)
{
   C[i] = A[i]*B[i];
}
Vectorized approach:
for (i = 0; i < 1024; i+=4)
{
   C[i:i+3] = A[i:i+3]*B[i:i+3];
}
shareedit
It refers to a the ability to do single mathematical operation on a list -- or "vector" -- of numbers in a single step. You see it often with Fortran because that's associated with scientific computing, which is associated with supercomputing, where vectorized arithmetic first appeared. Nowadays almost all desktop CPUs offer some form of vectorized arithmetic, through technologies like Intel's SSE. GPUs also offer a form of vectorized arithmetic.
shareedit

See the two answers above. I just wanted to add that the reason for wanting to do vectorization is that these operations can easily be performed in paraell by supercomputers and multi-processors, yielding a big performance gain. On single processor computers there will be no performance gain.
shareedit
3 
"On single processor computers there will be no performance gain": not true. Most modern processors have (limited) hardware support for vectorization (SSE, Altivec. etc. as named by stephentyrone), which can give significant speedup when used. – sleske Sep 14 '09 at 15:24
   
thanks, I forgot that parallelization can be done at that level as well. – Larry Watanabe Sep 15 '09 at 13:14

No comments:

Post a Comment