Channel: JIT, NGen, and other Managed Code Generation Stuff
Viewing all articles
Browse latest Browse all 40

Performance implications of unmanaged array accesses


I was recently shown the following code and asked why the loop calling SafeAccess executed significantly faster than the second loop calling UnsafeAccess:


       static int [] intarray = new int [5000];


       static void SafeAccess(int a, int b)


           int temp = intarray[a];

           intarray[a] = intarray[b];

           intarray[b] = temp;


       static Unsafe void UnsafeAccess(int a, int b)


           fixed (int* pi = &intarray[0])


               int temp = pi[a];

               pi[a] = pi[b];

               pi[b] = temp;



       static Unsafe void Main(string[] args)


           for (int i = 0; i < testCount; i++)


               SafeAccess(0, i);


           for (int i = 0; i < testCount; i++)


               UnsafeAccess(0, i);




Safe Loop:

I examined the code generated by the 64-bit JIT compiler for the SafeAccess loop (which was inlined into Main by the JIT).  Vance Morrison posted a useful article describing how to accomplish this from within Visual Studio: http://blogs.msdn.com/vancem/archive/2006/02/20/535807.aspx



00000642`801501f0 418b08          mov     ecx,dword ptr [r8]

00000642`801501f3 8b02            mov     eax,dword ptr [rdx]

00000642`801501f5 418900          mov     dword ptr [r8],eax

00000642`801501f8 890a            mov     dword ptr [rdx],ecx

00000642`801501fa 4883c204        add     rdx,4

00000642`801501fe 493bd1          cmp     rdx,r9

00000642`80150201 7ced            jl      00000642`801501f0


There are 7 instructions and 4 memory accesses per loop iteration, with no range checks remaining inside the loop body after optimization.  In this case there is no performance cost incurred for safety. 

Unsafe Loop:


By contrast, the unsafe version is a mess.  UnsafeAccess is larger MSIL (50 bytes vs 31) because Unsafe array accesses require more MSIL instructions than safe ones.  Given an array and index on the evaluation stack, safe array accesses require only a single 1-byte instruction: ldelem.  The C# compiler generates a much more complex sequence for Unsafe accesses:


  IL_000c:  /* 06   |                  */ ldloc.0 // &array[0]

  IL_000d:  /* D3   |                  */ conv.i

  IL_000e:  /* 02   |                  */ ldarg.0 // index

  IL_000f:  /* D3   |                  */ conv.i

  IL_0010:  /* 1A   |                  */ ldc.i4.4

  IL_0011:  /* 5A   |                  */ mul

  IL_0012:  /* 58   |                  */ add

  IL_0013:  /* 4A   |                  */ ldind.i4


Ignoring the first and third instructions, which are used to get the array and index, there are six instructions (and bytes) required to load an array element.  These extra instructions make UnsafeAccess larger than SafeAccess.  When determining which methods should be inlined by the JIT, one of the most highly weighted factors is the size of the inlinee method.  In this case UnsafeAccess was rejected for inlining, and because of this, the range check at &intarray[0] could not be removed.  In fact the unsafe loop variant actually caused more runtime range checks to occur than the safe variant!


Finally, the presence of a pinned variable inhibits many optimizations in the 64-bit JIT.  As a result, the generated code for UnsafeAccess is far worse than that of the safe variant.  Keep in mind that the following excerpt shows only the UnsafeAccess method itself, and does not even include the the loop in Main, as the SafeAccess example above does.



image00000000_00e40000!Arrays.UnsafeAccess(Int32, Int32):

00000642`80150260 4883ec38        sub     rsp,38h

00000642`80150264 448bc1          mov     r8d,ecx

00000642`80150267 48c744242000000000 mov   qword ptr [rsp+20h],0

00000642`80150270 48b9102e352000000000 mov rcx,20352E10h

00000642`8015027a 488b09          mov     rcx,qword ptr [rcx]

00000642`8015027d 488b4108        mov     rax,qword ptr [rcx+8]

00000642`80150281 4885c0          test    rax,rax

00000642`80150284 7641            jbe     00000642`801502c7

00000642`80150286 488d4110        lea     rax,[rcx+10h]

00000642`8015028a 4889442420      mov     qword ptr [rsp+20h],rax

00000642`8015028f 4d63c8          movsxd  r9,r8d

00000642`80150292 488b442420      mov     rax,qword ptr [rsp+20h]

00000642`80150297 468b0488        mov     r8d,dword ptr [rax+r9*4]

00000642`8015029b 4863d2          movsxd  rdx,edx

00000642`8015029e 488b442420      mov     rax,qword ptr [rsp+20h]

00000642`801502a3 8b0c90          mov     ecx,dword ptr [rax+rdx*4]

00000642`801502a6 488b442420      mov     rax,qword ptr [rsp+20h]

00000642`801502ab 42890c88        mov     dword ptr [rax+r9*4],ecx

00000642`801502af 488b442420      mov     rax,qword ptr [rsp+20h]

00000642`801502b4 44890490        mov     dword ptr [rax+rdx*4],r8d

00000642`801502b8 48c744242000000000 mov   qword ptr [rsp+20h],0

00000642`801502c1 4883c438        add     rsp,38h

00000642`801502c5 f3c3            rep ret



Unsafe array accesses have a lot of potential problems: correctness, GC heap fragmentation due to pinning, and as we have just seen, performance.  I hope that this example will help developers understand that safety does not necessarily incur a runtime cost.  Before attempting to evade a ‘safety tax’ it is a good idea to check if you are currently paying one. The first step in doing that is viewing disassembly of the optimized code


-Matt Grice


Viewing all articles
Browse latest Browse all 40