| ??? 12/31/07 22:45 Read: times |
#148873 - Re: what you think doesn't matter Responding to: ???'s previous message |
re> My main concern, however, is not for conservation of silicon
re> resources, but for conservation of effort.
What/whose effort are you talking about here?
re> I've looked at, though not thoroughly understood your
re> diagrams, etc, and that, as much as anything, has persuaded me
re> that you're overlooking a much simpler approach.
Huh? The fact that you've not thoroughly understood my
diagrams doesn't automatically imply that I've overlooked a
simpler approach, or even that I am/was striving for a simpler
approach. The fact is that I've considered a number of
approaches, and chosen one that to me represents a happy
balance between performance and simplicity. It's not the
simplest possible approach by any means, nor is it the one
that would give the best performance. It's somewhere in the
middle.
re> My comment, early on, that you can use the same ALU to operate
re> on address objects that you use to operate on data objects
re> seems to have forced you to believe that all I'm thinking
re> about is saving a gate here and a gate there. While it's easy
re> to reach that conclusion, that's not what I'm thinking at all.
re> The fact that I pointed out that it saves logic otherwise
re> required for those long and complex counters, which seem so
re> simple in the context of "SP--" or "++DPTR," was more of an
re> effort to show you that you'd then have less complexity in
re> your hardware design. I've tried to suggest that one can view
re> this MCU core, and, in fact, nearly any MCU core as a set of
re> resources demanding output, a set of resources providing
re> input, an ALU that preforms the necessary operations on inputs
re> and outputs, and a set of control logic that prepares and
re> steers the logic paths between all the resources.
I understand that view. But as far as I can tell, the idea of
a single ALU that does everything precludes certain parallel
operations that lead to better performance. Why should I box
myself into an architecture with that limitation?
rc> With that in mind, let me ask you this: Instead of vaguely
rc> expressing your "concern" about what I am doing, could you
rc> please identify those specific aspects of my design that you
rc> see as flawed, so that I have half a chance of considering
rc> them and possibly correcting them sooner than later?
re> This is really difficult, as I see the flaws, if there are
re> any, as being at the very top, namely in the method with which
re> you've approached the task. The implementation details you've
re> mentioned have supported this concern. It seems as though
re> you're designing an engine that does a few things and then
re> simply "hang a bag on it" for each additional operation.
Wow, I don't see this at all. My diagrams present a set of
data paths that is specifically designed to execute the 8051
instruction set, and my big chart shows in gory detail how
each and every instruction uses that hardware to do what it
needs to do. The implementation in progress follows the
diagrams almost exactly, my current effort is an almost purely
mechanical translation of the operations specified by the big
chart into the detailed control logic needed to route the data
around through the various muxes and such. If that isn't
top-down design, I don't know what is.
re> That's why I've occasionally cautioned you about attempting
re> make the hardware internals look too much like the software
re> externals, i.e. make the hardware track the behavior of the
re> MCU as viewed from its instruction set and how it executes it.
I had such a problem with this idea when you presented it
earlier that I thought I misunderstood you. But it appears
that I didn't. The only job of an 8051 is to execute the 8051
isntruction set, and it seems only natural to me that the
hardware would reflect that fact. Why would a processor
not reflect the instruction set it's designed to
execute?
re> One thing that you probably haven't considered, is that you
re> could, if you wanted, structure your code memory as
re> byte-addressable, but long-word (32 bits) wide. That allows
re> you to access all three bytes of a long instruction
re> concurrently, and, therefore, to execute the entire
re> instruction in one cycle, fetching and decoding the next
re> opcode concurrently with the execution of the current one.
I really like this idea, although it came to me somewhat late
in the game, after I had more or less committed to a byte-wide
code memory. Depending on where this whole thing leads, I
will definitely keep this in mind for a "rev 2" effort.
re> Comments such as "the Harvard architecture cries out for
re> concurrent access to code and data space..." or whatever it
re> was you said along those lines, is another example of a
re> "bottom-up" view, which is also reflected in the diagrams and
re> charts you presented.
I disagree that this is a "bottom up" thing. The separation
of the code and data spaces is perhaps the grossest and most
obvious feature of the architecture, and worthy of
consideration at the topmost level of the design. Of course
my diagrams reflect it. I'm not sure what point you're making
here.
re> Your suggestion of using clocks in a certain way suggests that
re> you've decided, IMHO prematurely, how the core operation
re> should be timed.
What suggestion was that?
re> It's like Andy has repeatedly said, "If this were easy,
re> everybody would do it." The numerous "rubbishware" examples,
re> in the form of partially functional, only partially complete,
re> insufficiently documented, and nearly-impossible-to-understand
re> cores in the public domain should have convinced you that it's
re> not as easy as it looks.
Again, what's your point?
-- Russ
|



