On 23 Feb 2015 00:10, Jiri Gaisler wrote: > On 02/22/2015 09:51 PM, Mike Frysinger wrote: > > On 19 Feb 2015 23:31, Jiri Gaisler wrote: > >> + *((unsigned short *) &(mem[waddr])) = *data & 0x0ffff; > > > > this violates strict aliasing. you can't cast the RHS side like this. it also > > violates alignment since the buffer is passed in as unsigned char *. err, i obviously meant LHS there > I don't fully agree on this. *mem holds the pointer to romb or ramb, > which are defined as unsigned char arrays. However, their definition is > is preceded with an integer define: > > static uint32 mem_blockprot; /* RAM block write protection enabled */ > static unsigned char romb[ROM_SZ]; > static unsigned char ramb[RAM_END - RAM_START]; > > This means that romb and ramb are aligned on a 4-byte boundary > on systems where this matters (SPARC, ARM). When casting to short, > waddr is always aligned on 2, when casting to integer waddr is > always aligned on 4. So the casting really works without getting > an alignment error. Can I rather document this instead of using > a slower memcpy()? In cpu simulation, performance is essential and > every (host) instruction counts. afaik, nowhere are you guaranteed that the memory layout will match the decl order of your code. there's no reason gcc/ld couldn't reorder those objects as long as the declared alignment is maintained. if you wanted to do that, you would have to create a union instead like: static union { unsigned char u8[ROM_SZ]; uint16_t u16[ROM_SZ / 2]; uint32_t u32[ROM_SZ / 4]; } romb; and then pass in romb.u8 or romb.u16 or whatever. even if the memory order was guaranteed, gcc cannot infer that level of alignment. it would (rightly) complain that strict aliasing was being violated. > > you should use memcpy() instead. on systems where unaligned access are OK, gcc > > should optimize it down to a few load/stores anyways. > > memcpy() does have some overhead compared to a single store ... i think you misread what i said. on hosts, like x86_64, there is no call to memcpy(). gcc will replace it with the exact asm insns required. in this case (a 16bit store), that's what you'll get. $ cat test.c int main(int argc, char *argv[]) { memcpy(argv[0], &argc, 4); puts(argv[0]); return 0; } $ gcc -O3 -S -o - test.c main: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 /* these two insns are the memcpy */ movq (%rsi), %rax movl %edi, (%rax) /* here is the call to puts */ movq (%rsi), %rdi call puts xorl %eax, %eax addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc -mike