Optimized variants for vectors |
1.Unrolled vectorial operations
In vector_unrolled.hpp is implemented a variant for vectorial operations with loop
unrolling. Precisely vectorial operations on elements of type vector<C,vector_unrolled<n,V>
> are unrolled by blocks of size .
For example the piece of code
#define vector_unrolled<4, vector_naive> V
vector<signed char, V> v1=…, v2=…;
vector<signed char, V> w= v1 * v2; |
actually computes the entrywise product of v1
and v2 as follows:
for (nat i= 0; i <…; i += 4) {
w1[i ]= v1[i ] * v2[i];
w1[i+1]= v1[i+1] * v2[i+1];
w1[i+2]= v1[i+2] * v2[i+2];
w1[i+3]= v1[i+3] * v2[i+3];
} |
2.SIMD support
Several architectures support special instructions called SIMD (Single
Instruction, Multiple Data) for performing operations on
“hardware vectors”. This imposes the memory position of
the vectors to be aligned. When allocating memory space for a vector
of type vector<C,V>
of size n one thus
needs to proceed as follows:
nat l= aligned_size<C,V> (n);
C* buf= mmx_new<C> (l);
// fill buf here…
vector<C,V> v (buf, n, l); |
Note that the memory will be freed once v
destructed.
In vector_simd.hpp, the variant vector_simd allows one to benefit of the SIMD functionalities. For
instance a vector of type vector<unsigned
char,vector_simd<8,4> > unrolls blocks of 8
simd sub-vectors, the rest being unrolled by blocks of size 4 with
classical instructions.
A default variant of type C
is available through the macro Vector_simd_variant(C).
Currently only SSE2 and SSE3 are partially supported – details
are to be found in vector_sse.hpp.
© 2010 Grégoire Lecerf
Permission is granted to copy, distribute and/or modify this document
under the terms of the
GNU General Public License. If you
don't have this file, write to the Free Software Foundation, Inc., 59
Temple Place - Suite 330, Boston, MA 02111-1307, USA.