Fix x86/ffi64 calls with 6 gp and some sse registers (#848) * Fix x86/ffi64 calls with 6 gp and some sse registers * Add test demonstating issue when mixing gp and sse registers