??? 01/21/06 19:30 Read: times |
#107994 - Architecture Responding to: ???'s previous message |
Jez Smith said:
Sasha's comments where very interesting especialy the part about the stretch cycles
Sasha Jevtic said:
What would be nicer, however, would be a hardware mechanism for mapping stretch cycle counts onto particular XDATA address ranges. Thus, whenever you make an external memory access to an address associated with a slow peripheral, the appropriate number of stetch cycles would automatically be used. Obviously this would be achievable with some suitable PLD with a look up table mechanism and variable length delay chains. Absolutely. It's just a shame that functionality is not already built in. Jez Smith said:
And also this comment
Sasha Jevtic said:
You might be running at "12x the 'normal' rate", but in reality, you can never realize a 12x performance increase by switching from a conventional architecture chip to this device. As you can see from the Ultra-High-Speed Flash Microcontroller User's Guide, in the subsection "Comparison To the 8051" beginning at the bottom of p. 52, for any given opcode, the DS89C4x0 offers performance advantage factor of significantly less than 12 for a large number of instructions and opcodes. This is due to the fact that when a pipelined achitecture processor makes a jump in program execution for whatever reason,be it an interrupt or a branch so the code execution is non-linear the pipeline has to be flushed all the instructions which were partly through the pipeline dicarded and the pipeline has to refill with the new instructions.So any speedup achieved by using a pipeline is only aplicable when executing linear sections of code, your milage will vary according to your application. Actually, my point was simply that since that not all instructions take the same amount of time, you cannot even begin to get a reasonable approximation of the speedup without knowing the instruction mix for the target application. And even that's not going to give you a really good estimate of your speedup. Benchmark design is always a thorny topic. But, you raise a very interesting point. I've actually never read anywhere that the DS89C4x0 is pipelined, but thinking about it, it does seem rather likely. If that is the case, you are absolutely right about pipeline flushing at a branch. You also mentioned hazards in another post here. Hazards, while a performance issue in a pipelined processor, can be at least partially mitigated with the addition of appropriate forwarding logic. It is also possible to resolve hazards by inserting stalls inside the pipeline, and in some cases, may be required in place of or in addition to forwarding. While less desirable than forwarding alone, stall insertion is simpler to implement in hardware, and is still much better than flushing the entire pipeline. The performance issue you mention associated with branches is not really limited to pipelining. Generally speaking, short basic block length adversely affects the performance gains achieved through architectural enhancements that leverage instruction-level parallelism, including superscalar architectures and speculative execution. --Sasha Jevtic |