Optimized variants for vectors

1.Unrolled vectorial operations

In vector_unrolled.hpp is implemented a variant for vectorial operations with loop unrolling. Precisely vectorial operations on elements of type vector<C,vector_unrolled<n,V> > are unrolled by blocks of size . For example the piece of code

#define vector_unrolled<4, vector_naive> V
vector<signed char, V> v1=…, v2=…;
vector<signed char, V> w= v1 * v2;

actually computes the entrywise product of v1 and v2 as follows:

for (nat i= 0; i <…; i += 4) {
  w1[i  ]= v1[i  ] * v2[i];
  w1[i+1]= v1[i+1] * v2[i+1];
  w1[i+2]= v1[i+2] * v2[i+2];
  w1[i+3]= v1[i+3] * v2[i+3];
} 

2.SIMD support

Several architectures support special instructions called SIMD (Single Instruction, Multiple Data) for performing operations on “hardware vectors”. This imposes the memory position of the vectors to be aligned. When allocating memory space for a vector of type vector<C,V> of size n one thus needs to proceed as follows:

nat l= aligned_size<C,V> (n);
C* buf= mmx_new<C> (l);
// fill buf here…
vector<C,V> v (buf, n, l);

Note that the memory will be freed once v destructed.

In vector_simd.hpp, the variant vector_simd allows one to benefit of the SIMD functionalities. For instance a vector of type vector<unsigned char,vector_simd<8,4> > unrolls blocks of 8 simd sub-vectors, the rest being unrolled by blocks of size 4 with classical instructions.

A default variant of type C is available through the macro Vector_simd_variant(C).

Currently only SSE2 and SSE3 are partially supported – details are to be found in vector_sse.hpp.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License. If you don't have this file, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.