Recent posts

↧

icc -masm=intel generating invalid/unrecognized assembly?

September 23, 2013, 6:07 am

Hello,I am building code that contains some asm() blocks written following Intel assembly syntax. When I compile this code for the MIC with -masm=intel, I get an assembler error, which is not related...

View Article

Efficient branching on double vector comparison using intrinsics?

September 25, 2013, 3:06 am

Hi,I want to check the range of a vector of double-precision variables, in order to branch to a slow path on exceptional out-of-range cases. My code looks like the following: // if(any(!(x < 4.)...

View Article

Quad precision architecture one day?

November 19, 2013, 7:28 pm

Hello.I have a dream in which the IEEE-754 quad precision is implemented in hardware in order to allow fast extended precision computation in a portable way instead of the current 80-bit extended...

View Article

How to convert _mm512 to float

June 7, 2013, 4:00 pm

Is there an easy way to extract component 0 from _mm512 vector ?Looking at assembly of _mm512_reduce_gmin_ps it really computes an _mm512 (of course), which is then passed to scalar operations.I tried...

View Article

_mm512_mul_epi32 not working?

March 4, 2014, 1:31 pm

#include <immintrin.h> #include <zmmintrin.h> // not needed but put here to show it is indeed included ... __m512i a,b,c; a = _mm512_mul_epi32(b,c); produces this error: undefined...

View Article

cast m512 to m512d

March 10, 2014, 5:55 pm

Hey all, simple question: How does the cast operation _mm512_castps_pd work?A __m512 data type holds 16 floats i.e. 16 elements. Contrary to that a __m512d data type can only hold 8 elements -- so what...

View Article

How to test a mask register for any non-zeros?

July 17, 2014, 12:24 pm

Dear all,I want to test a mask register for any non-zero values. I am not able to test this on my own MIC for a few days so I decided to ask here if this is possible to do efficiently, i.e. not via...

View Article

icc -masm=intel generating invalid/unrecognized assembly?

Efficient branching on double vector comparison using intrinsics?

Quad precision architecture one day?

How to convert _mm512 to float

_mm512_mul_epi32 not working?

cast __m512 to __m512d

How to test a mask register for any non-zeros?

cast m512 to m512d