- The LaST Upgrade -

PART 11 - STFM/STE Booster progress part 2

Last updated July 21, 2023

 

December 29, 2016

So with the previous booster page HERE getting so large a new page (this one) was created :)

Thoughts currently are still with running the CPU at 16MHz all the time. Thinking about it some more, I realised that when the ST-RAM is boosted and the CPU, the CPU actually runs at 16MHz all the time and it works. Though the MMU is also doing double time there so DTACK isn't likely a problem.

The problem if the CPU is run at 16MHz just by itself, is mostly that when the MMU sets DTACK LOW, it stays low right up until S2 on the CPU cycle (just before /AS goes low again, or actually pretty much right at the same time). If the CPU is going faster, then it can actually read DTACK twice and read the same bus information twice while DTACK is still low from the previous bus cycle. In other words, crashville.

Looking at the LUCAS document how they seem to solve these issues was to run the CPU at double speed but still in sync with the system clock. Then they run /AS and DTACK though a flip-flop which is clocked at the lower speed (As its a Amiga project 7MHz). So basically /AS & /DTACK are delayed by 1 clock cycle and things seem to be generally happy... and you know there is a BUT coming.

The T25 booster (25MHz booster) I think uses the same concept. The problem with that is it doesn't run faster than 25MHz.. While the logic delays or CPU itself could be to blame, after looking at the logic traces for endless hours, it would seem the delays are synced in much the same way as the LUCAS document. The problem there is where normally the CPU will set /AS LO and the HI of the CLK8 edge, running twice the speed at 16MHz means the cycle will start on the CLK8 LOW part of the cycle. So using a Flip-flop will re-sync it to the CLK8 HI again, and all is well with the world. BUT...

If you run the CPU even faster such as around 32MHz (made that number up as to lazy to check) Then chances are the CPU will set /AS low on the previous low of the CLK8 cycle. This now means that after the Flip-flop delay it will actually sync a cycle to early and the CPU will try to sync to S0 when it should be on S2. So we are back to crashville again.

The obvious solution here is to use a second Flip-flop Delay, so all is well with the world again. Though what happens when we want to switch back to the default 8mhz clock speeds ? With this delay chain in the way, it would need even more logic to basically disable it all. This also means that using flip-flops that a adjustable clock source such as variable from 8Mhz to 100MHz (yeah right!) couldn't be used. Not without wiring in the exact number of flip-flop delay chains needed for each speed range.. which kinda sux. Originally I tried that idea and had some limited success but later decided flip-flops are the route of all evil and should be banished to the land of the forgotten.

I had thought at some point to sync using DTACK itself. Though DTACK on the Atari is a nightmare. If you access 16bit such as the ROM, then DTACK actually goes low about 20-40ns after /AS goes low. So it needs a delay to sync that back up to S4 on the CPU cycle. Sounds simple enough doesn't it ?

The problem then becomes on 8bit reads that DTACK takes several cycles before it appears. So you can't simply run DTACK though a flip-flop as then DTACK would arrive too late and the data on the bus likely isn't valid anymore. So then we need some more logic basic on what the CPU is accessing and then have some FF delay logic to cope with each access type.

I did try such a idea, and ran countless simulations, but really it was just a shift register which was set in motion when /AS went low. So if it was clocked at 16MHz then after 4 clocks you could assume you was in S4 of the bus cycle and all would be happy.. Though when I tried to run the CPU faster only when accessing a 8bit cycle, it crashed out. So DTACK actually still needs to be delayed or synced back up the the 8mhz clock as even checking DTACK a few ns to early seems enough to plummet it all into chaos. In the end I just gave up with the whole idea and started to think up a better method which was just a lot simpler.

From looking at the Atari timings, DTACK (as far as I could see on my capture) always goes HI on S2. The problem here is with ROM access, as mentioned before when /AS goes LO for ROM, DTACK also goes low basically at the same time. So running this mess though a flip-flop just wouldn't work. In fact you have to assume DTACK is LOW all the time for several bus cycles (when accessing ROM that is). So trying to add any FF delays just doesn't work.

Though after much more thought, if DTACK is pulled high with a lower value resistance, it is actually possible to get 20-40ns of HI time around cycle S2. This is enough time to trigger some logic such as a SR latch. So I ran many simulations and came up with a simple SR circuit where combinations of /AS and /DTACK would set or reset the latch. Basically what happens here is when the CPU completes its cycle, /AS will go high and normally it will go low again a few clocks later to start the next bus cycle. Though if the CPU is running 4 times as fast, then it can look for DTACK and see it as the ST chips have not released DTACK from the previous bus cycle yet. In actual fact if the CPU did 4 reads, it could read the same data off the bus 4 times and DTACK would simply be low the entire time. That would be bad.

So what I do is use /AS and /DTACK as a HI to trip the SR latch. Basically this allows the CPU to set /AS low again but only when DTACK has gone high. As its a SR latch, even if /AS goes low again (which it is expect to) then the SR latch becomes a "hold off circuit" and pauses /AS from reaching the ST bus until the SR gets reset by /DTACK going high again. Confused ? join the club! But wait , it gets better..

This is all assuming DTACK always goes high on S2 of the CPU cycle. If we forget the DTACK timing with ROM access, as long as the ROM IC's are capable of 16mhz then the CPU can run at 16MHz and read the ROM as fast as possible (which is what my V1.X series of booster do anyway). As the CPU can run at 16MHz as long as the RAM is double clocked, then it would seen the only chip causing issues is the MMU with DTACK.

If we assumed the MMU was running at stock speed (8MHz) and the CPU was running 16MHz and wanted to do 2 RAM reads. Then the MMU would set DTACK low, the CPU would read the bus, terminate the bus cycle, start the next cycle, and wait for DTACK.. Which is already low from the previous bus cycle. So we don't want to delay DTACK at any point, but to simply wait for it to go high before we start another bus cycle. That system should work and even if DTACK did terminate early (lets just say 1x 8MHz clock early which equates to S0) then the CPU could actually start the next bus cycle a fraction sooner. This is what actually happens (kind of) with the V2 booster, but the ST chips still running along at 8MHz don't seem to care about it anyway.

So in order to fix that possible issue, I just add CLK8 to my SR circuit and make sure at least that /AS is syncing to a CLK8 HI, which should be S2, again assuming DTACK goes HI at that time. Should DTACK go high sooner then chances are its going to go HI on a CLK8 LOW period, so waiting for CLK8 HI in the SR circuit would partly solve that problem. I think however its unlikely DTACK would go high sooner as it would be getting out of spec to the 68000 timings. Not saying the Atari design is even remotely like timings as outlined in the 68000 handbook in the first place, but some things must be similar.

DTACK mostly seems to arrive on a CLK8 LO, but the CPU doesn't expect it until a CLK8 HI. So likely I may use a second SR circuit to re-sync to the CLK8 HI to keep things looking how they should. I don't think adding a small delay there would cause any problems anyway. Though readings it 100ns faster than it should, could cause a problem. Though at least I have something in mind should it need fixing.

In theory at least, the CPU shouldn't need to be in sync with the system clock either. As the SR latch syncs to the CLK8 cycle for /AS, and DTACK is synced to the 8mhz clock in the ST chipsets anyway, then the CPU speed shouldn't matter. This should allow the CPU speed to vary from the stock 8MHz speed to 100MHz without having to do any circuit changes for each speed. Of course this all needs to be actually tested on the Atari itself.

I easily converted my circuit to PLD code (again Atmel chip). I thought I would give the Atmel PLD another try as I already have a development board which allows me to route any signal anywhere easily. Though as usual a few lines of code are not playing well with the CUPL compiler :( It should only take seconds to try this out, but will have to contact Atmel support again and see if they can explain why my code compiles fine, but doesn't generate a jed file! Creating a SR latch with a couple of gates shouldn't be this much trouble. I'm still thinking about changing to Altera and just moving away from coding into just pure logic circuit designs. Though as I am not very familiar with Altera's IDE yet, it seemed more logic to use my Atmel based board as its already wired up ready to go.. This all gets very tiresome :(

So currently I am waiting for Atmel to get back to me to hopefully fix these issues. If they can't be fixed, then I will wire the whole thing up on breadboard using dedicated TTL chips and just hard-hack it all to the CPU lines to try it out. Then what *should* happen is the CPU should be able to run at at least run at 16MHz from the system clock. Likely as the CPU is 16MHz then ROM and internal instructions will run faster as per my V1.x series booster. Though point being that the CPU should also be able to run at 8MHz without anything slowing down or going to fast or explode or something.

Its also likely that some other signals such as LDS & UDS will also need to be tweaked. I can probably just wire them to the /AS equations so LDS & UDS only go active when the ST_AS line is active. Its also possible the BG lines may need some tweaking into it all, though one problem at a time :) If the SR idea doesn't work then I am not really sure how to solve the speed issues. Likely I will shelf the project and work on boosters using the 020 which can be run at 16MHz all the time along with the ST-RAM. So hopefully I will have some luck with the these issues for once..

 

December 30, 2016

So finally got a reply from Atmel and got my code to compile. Though the ST doesn't boot :( So either my code sux or the IC or my idea..

What I will do is order some logic chips and hardware some stuff in, I will not break any CPU lines, just run my circuit in parallel just to drive my scope to see whats going on. What a great way to end the year :(

 

January 1, 2017

I tried some things, mostly it seems my SR latch output doesn't match my simulation. In fact I use /Q on my simulation, but only Q works on the PLD. Odd. In anycase it now boots on 8MHz. So Watching for /DTACK going high seems to work OK, at least so far. Running at 16MHz and still no boot. I later realised that while ST_AS (ST side address strobe) is held off, the CPU itself still sets /AS low and would still read DTACK anyway. So the circuit became a little pointless. However, the easy for for that was to merge CPU_DTACK with ST_AS. As ST_AS is held off, and has the timings I want, then I can use the same signal to set CPU_DTACK high when CPU_AS goes HI.

So the functional logic is ST_AS is delayed (source is CPU_AS) until ST_DTACK is seen going HI. That will stop the CPU from starting the next bus cycle to early. Also CPU_DTACK is "chopped" with ST_AS so basically the CPU sees DTACK as a HI when the CPU sets /AS high. This way it does not see DTACK even though ST side DTACK is still low. This stops the CPU from reading DTACK multiple times.

The logic above still only works on 8MHz :( The next problem is likely ROM access though not totally sure. Where the CPU would read ROM twice as fast (on 16MHz) so the ROM decoding would need to be a lot faster for when the CPU reads DTACK. Considering I can run 45ns ROM's on the STE at 32MHz with the STE doing the decoding, then the STFM should work at 16MHz anyway. But at this point its a small assumption.

What I am trying to do now is delay DTACK from the CPU by 1 8mhz clock cycle, though only when its reading the ROM. I'm not totally sure but I am assuming 16bit access is to ROM only. As the GLUE logic controls DTACK for ROM to early then the CPU could read the bus when it isn't ready. Though as my V1.5 series booster runs the CPU at 16MHz on ROM access anyway, then I think it probably isn't related to ROM access.

As the CPU can run at 16MHz all the time when the MMU does, then it must be something the MMU isn't happy about. As the ST doesn't boot then it must be a timing problem with RAM access, but no clue what. As far as I am concerned, if the CPU is setting /AS at the right time, and the CPU isn't starting the bus cycle early, and isn't double readings DTACK, then all should be well, but something is still wrong.

What I am trying to do now is add delays to DTACK based on LDS and UDS. Though like always, coding this seems to be a nightmare. According to documents online, a D-FF should be disabled to act like a straight wire (IE no delays though the FF). Idea there is to purposely bypass the FF and the circuit should work as normal as a stock machine. Though its not booting and no idea why. So another email to Atmel about this. Idea being if I can selectively enable and disable delays on DTACK, I can try 8bit and 16bit delays and see if it gives some clues as to what the MMU is doing DTACK wise.

In reality, Delaying DTACK shouldn't matter, I've done it before and ended up with a machine running at 85% speeds. So its a mystery (again) as to why delaying DTACK by a clock cycle isn't working.

 

January 2, 2017

Today I worked out that for some reason the Asynchronous reset of the FF doesn't seem to be doing its job. I simple added ST_DTACK as the CPU_DTACK "off" condition and now the ST booted to desktop. The AR of the FF should do the same thing, but for some reason it doesn't. Argh!

So now I have a single FF delay clocked at 8MHz. Benchmarking gave all stock speeds... This so far means that DTACK likely is generated to early so the delay doesn't do anything.. But next is to add a second delay..and basically keep going until something breaks :)

2 delays still gave 100% scores.. 3 delays. 100%... Starting to wonder if the delays are doing anything at all now... so 4 delays... 100%.. 5 delays.. 100%... (WTF?!) 6 delays.. 100%... 7 delays...100%..8 delays.. 100%.. 9 delays... 100%.. ok so clearly there is something wrong here as the delay isn't working. Though I've had issues like this with FF's before that they don't seem to delay when they should.. So need to look into that some more..

UPDATE:

Right, disabled AR and AP and "reset" the FF just by clocking in CPU_AS along with DTACK , kinda does the same job. Now with 3 delays I just get a black screen white border, 2 delays it tries to read the floppy drive but basically fails.. 1 delay.. still failing to access the floppy drive.. bugger..

 

January 3, 2017

Finally figured out delaying in my Atmel PLD. Did lots of tests delaying AS & DTACK for 8bit and 16bit modes seperatly.. results below..

16bit slow down tests
1 FF - pauses on floppy access , goes to desktop without floppy loading..
2 FF - bombs
3 FF - crash on power up
4 FF - bombs
5 FF - 528ns - boots to desktop ?! 99% speeds!

8bit slow down tests
1 FF seems like 50ns delay - seems ok - probably just syncs to rise edge which is close to normal anyway.
2 FF's = about 152ns delay seems ok.
3 FF delays gives bombs or doesn't boot - 256ns
4 FF delay gives crazy ass video on boot up lol - 388ns
5 FF delays 500ns - boots but DTACK seems to vary and could be glitching
6 FF delays - boots - DTACK seems to be 99% low all the time.

AS DELAY TESTS
1 FF 252ns - reset - floppy comes on - resets
2 FF -380ns reset loop
3 FF -500ns - reset - floppy comes on - resets
4 FF's - 624ns - stuck in reset loop
5 FF - reset loop
6 FF - dead

Later I tried running the CPU on 16mhz off the system clock. Best I could get was with 1 FF delay on AS & DTACK. This is what the LUCAS document does, but I just got a row of bombs on the screen. I tried for hours with various delays on DTACK and AS but couldn't get it any better.

I later thought about RW line as it could conflict with the CPU and bus, but when I only enabled it when ST_AS was low (or 1 FF delay) I then got a row of smaller bombs.. strange. I also tried adding in BGACK just in case the CPU was conflicting with something else on the bus, but long story short, nothing worked.

Some other code I did several months ago allowed me to get to desktop, but crashed on the floppy drive access. All I did in that code was to delay DTACK by 3 clock cycles and 1x 32mhz cycle first. I think there was some odd issue where the first FF had to be clocked from the CPU clock, then FF's after clocked from the 8MHz clock.. can't remember why I did that previosly. ST_AS I simple did 2 8MHz clocks. I may re-vist that older code and see if I can learn why that code got further than my current code.

 

January 4, 2017

Tried my original code from months ago and it crashes when trying to access the floppy. Though I have now fixed that and found that LDS & UDS needed to be tri-stated. I don't really get why. I used a delay the same as ST_AS so they were in sync, but I simply moved the delay to tri-state instead and now I can load GB6! It passes all RAM tests aswell. So all seemed good so far.

I had 3 FF delays on DTACK overall. This never made any sense as 2 clock delays should have been needed. In anycase, I added in my SR code so the CPU ignores DTACK until ST_DTACK triggers the SR latch to allow the CPU to monitor ST_DTACK again. Now the DTACK delay works with just 2 FF delays! So using the SR latch seems to work really well. I tried 1 FF delay and it crashed just as it got to desktop. Though not surprised there.

So I managed to load GB6 and it gave 2 or 3 bombs doing tests :-( It was odd as it would pass all RAM tests, ROM obviously worked as it booted up, floppy drive all loading programs fine. Though I later found that GEM WINDOW test was the only test which would run. Test actually ran at 111% so my DTACK generation was a fraction faster than it should be, though that can be fixed another time. Looking at my GB6 code that test doesn't turn off some MFP stuff. So that got me thinking..

Turns out that the MFP DTACK generator is actually done correctly to match the 68000 timing spec exactly. So delaying DTACK actually made the MFP DTACK 2 clocks to slow. So likely just resulted in some corrupt data somewhere. I saw that LDS was driving the DS line on the MFP, so I tried to not delay DTACK when LDS was low.. but the machine didn't boot up at all that way :-( So now I have a bit of a problem as its difficult to tell what is actually generating DTACK to work out what delays it needs.

I suppose there could be 2 methods, bypass the DTACK delay on a MFP address, or wire a cable to the PLD to the CS line of the MFP so the CS lines becomes a DTACK delay bypass line. Though I may try both methods just to actually confirm that the MFP is the problem.

It also begs the question about other devices which may use the bus like the blitter. There is no way to know when DTACK will arrive. So I may need to re-think my DTACK code to something which is more adaptive to when DTACK arrives. More fun to be had...

I tried a 20Mhz oscilator just for the hell of it. It gets to desktop and loads desktop.inf with a bit of a grumble from the floppy drive. It failed to load GB6, just getting disc error messages. Though as the system is running at least 10% faster than it should anyway, then likely (at least I hope) that when the CPU is in proper sync with the 8MHz bus that the problem will be resolved.

 

January 5, 2017

I tried adding in the MFP address range from $FFFA00 but this did not help :-( With this the DTACK delay is AND'ed with NOT that address range. So if it finds a MFP address, then the AND function breaks, and the delay on DTACK is not added in. I tried no delay, then 1 delay, 2 delays was default anyway, 3 delays caused 3 bombs and screen corruption, 4 delays just 2 bombs again, so not sure the MFP is to blame here.. Eitherway I have another problem to figure out :(

I tried $000100 range also and just bombed on power up, but likely those values are in STRAM so explains the crash.

After some more investigation and talking to d.m.l (who coded the ASM routines for GB6) we worked out the acia_off function was the reason for the crash. This isn't a software issue though, its a hardware one. As they keyboard worked fine I am not sure whats going on there. Possible a write is failing to the ACIA causing the crash. I tried slowing down the E-clock 50% but the keyboard simply stopped working all together. The keyboard works fine with the 8/32MHz booster, though as the CPU switches to 8MHz during bus transfers, then its likely the ACIA could still be running at 8MHz when being accessed (not sure on that though).

Looking at the 68000 manual, it mentioned VMA(output) and VPA (input) in conjunction to 6800 range of devices. I already had a 1 clock delay on VMA, so increased that to 2 clocks.. and keyboard still not working.

So I tried GB4 out as that doesn't do fancy tricks...

 

We can see Integer Division is 192%. So clear indication we are running at 16MHz. Really when the CPU is running at 8/16MHz switch (V1 booster style) there should be about 25% speed increase on all tests. So likely my DTACK control is a little to slow. As I am syncing to the 8MHz clock, I probably will have to switch that to 16MHz clock and double the delay chains. At least then I can chop the timings by half 8MHz cycles and recover the "lost" speeds. Though that isn't important right now.

I still need to figure out that ACIA crash problem. I have been talking to d.m.l a lot today and we have found its ACIA writes which seem to be causing the issue. Though it only seems to crash when turning off the keyboard and mouse.. The turn on command which writes to the same registers doesn't seem to crash it. So mystery why one write fails and another write works...

 

January 6, 2017

Decided to take a look into the 16MHz low scores. I stopped using the 8MHz clock to sync, and used the 16MHz one. In effect to speed up some DTACK access. I couldn't alter the delays much without it resetting just as it got to the desktop. Though the speeds did go up a fraction. Display tests got a 6% speed boost overall.

I'm starting to think now that my past 16MHz boosters which used 8/16 switching probably started the next CPU cycle while DTACK was still low. So it got a head start on the next bus cycle.

 

So I thought to remove the delay on /AS to gain more speed if possible there.. But still was the same.

Now I thought to remove my SR code because that waits for DTACK to go HI before the CPU is allowed to start another cycle. Technically Waiting for DTACK to go high is the right thing to do. But I want to know why the V1 booster was faster than my current tests.. So SR was removed.. and no /AS delay used.. Oddly the machine booted and gave the same results as above.

I think its clear that the ROM access isn't getting the speed boost it should. Though I have a fast ROM installed so a smaller 16bit delay should work.. but it does not.. Its the only reason I can think of the slow down. So I lowered the 16MHz access delay line to see what happened now with all the other mods in the code... and just get a row of bombs on the screen. I tried adding in the /AS sync and delay but didn't help. I did a lot of code tidying up as was getting to confusing.

I think the problem is, while I am forced to delay 16bit access by 3 cycles, the ROM logic as it issues DTACK right after /AS, on the V1 booster it can read DTACK right away and complete the cycle. Though as ROM access is delayed by the same 3 cycles, ROM runs slower. This doesn't add up in my mind though as when the CPU isn't accessing the bus, the scores should be higher including ROM. So not sure I understand whats going on exactly.

In anycase, I mashed in the ROM decoding lines to bypass DTACK and see what happens... Unfortunatly the ROM wouldn't decode any faster. Still stuck on 3 cycle delays. This makes no sense :-(

 

 

January 7, 2017

The above are tests run with GB4 on my V1.5 booster. The left image is with ROM CE enabled to switch to 16MHz mode. The right image is just CPU boost only to 16MHz. ROM speeds are not accurate in GB4 and should show 170% speed or there abouts with fast-ROM enabled. I uploaded these images so I had a reference point as to what my current tests should do.

I experimented with various DTACK delays and the only thing which seemed to boost speed was when using ROM decoding to use LDS and UDS in the equation. This allowed me to use a FF less in the ROM delay and it gave a jump in speed. I did not take a image as it was like 2am when I did the test, though testing thismorning again, it didn't boot so had to revert back to 3 FF delays :-(

Really the DTACK delays still do not make much sense. ROM should work without any delays like it does on the V1.5 booster. So it doesn't make sense why I need delays there. Now I un-forgot that during some SR tests last week I noticed a glitch in the DTACK hold off logic so maybe the glitch was the cause of the issues. It would seem unlikely the CPU would latch the data during the glitch time, but its the only lead I currently have to go on.

Looking at DTACK it seems to be glitch free, however there seems to be a lot of odd looking pulses. Its hard to work out if the pulses are valid or not. Though have to assume they are glitch's in ST's DTACK generator.

Looking at my SR code I decided to add back in CPU & DTACK HI before allowing the SR to reset. With 2 FF delays on ROM this now booted right up and I was able to take a screenshot.

Looking at DTACK now I see a small glitch in the waveform again..

Later I had this..

I did overlay the 8mhz clock and saw 2 cycles on DTACK low. So that would be 4 cycles at 16mhz. So assume it is correct timings. I can also see DTACK hold off logic is doing its job also.

I tried with UDS LDS in the ROM equation and crashed again. I then tried UDS and LDS back but with 1 FF delay this time and bombs again.

My next thought was to allow /AS to go low as soon as the CPU wants, rather than delaying or trying to sync it to the 8mhz clock. Idea here is that on the V1 booster the CPU would actually run a single 16mhz clock cycle faster so /AS reaches the bus a fraction faster than normal, then the CPU enters 8mhz mode so timings don't get messed up. So my next idea was to let /AS go low as soon as it can, and just make sure DTACK to the CPU isn't allowed to go low until the SR latch has reset on DTACK going high.

My first attempt failed and it just didn't seem to like /AS going low faster than it should. So the AS sync code was added back in., which didn't help either. I put the SR code back in and now I get bombs again. Overall if I reduce the delay /AS end I have to increase DTACK delay, so I think its clear the timing just can't be any faster. So what is causing my V1 booster to run so much faster ??? I have started to think "outside the box" a little and wondering if the CPU does 2 consecutive ROM reads, then the delays , at least on /AS might not be needed. Though its impossible to know what instruction is coming next, even if I did, the logic would delay it all anyway.

One clue might be that in my V1.5 booster tests, I can get 196% on int-div tests, but my current CPU tests only showing 192%. This suggests the CPU isn't running 16MHz when it should. Though when /AS goes high, the CPU runs though all its internal instructions, then sets /AS low to dump the result somewhere. Though in both circuits, the CPU should be running the same number of 16MHz cycles. So I can only assume that there is some very small delay on RAM access. Looking at the GB4 results I do see on my V1.5 tests I get 101% for RAM speed, but 100% on my current tests. It could be a 1% drop in RAM speed could translate to about 5% speed drop overall on all tests. Its atually interesting as my delay on DTACK is 1 clock higher when accessing the 16bit bus (excluding ROM access) So this could suggest 16bit RAM access is a fraction slower than it should. This again gets back to me trying to run the delay clocks with 32mhz to gain better control over the delays, but this isn't easy with the STFM as the 32mhz line really sux. At least on the STE it is buffered. I'm sure at some point in the past I found that a small delay in DTACK could give a fair drop in speed, so even just a few ns to slow could result in a speed drop. So this starts to become a huge problem.

Not only assuming RAM access is 1% to slow, but during ROM access I should be seeing a huge boost aswell. I can't run ROM any faster either, the delay is 2x16mhz clock after DTACK goes low. Which in the case of ROM access, is about 40ns after /AS goes low. With the CPU running 16MHz then 2 clocks is all it needs then it will read DTACK anyway. The ROM is fast enough to run at 16MHz on the V1.5 booster, so access to ROM should be faster.

As DTACK for ROM access stays low for ROM access (for arguments sake) low all the time. Then I wonder if there is some odd situation where the CPU can read ROM, and then access RAM all within 4x 8MHz clock cycles. I don't think there would be 2 ROM reads one after another as there must be some RAM access somewhere between. Trying to think on how the V1.5 booster would work. It would run at 16MHz when /AS is high, then the CPU would want a ROM address, so the CPU would switch down to 8mhz, but shortly after GLUE would set ROM CS low and then the booster would switch into 16MHz again. Likely between the 2 bus cycles there is only 1 16mhz clock dealy there and the CPU would start the next cycle early. DTACK would be pretty much low all the time in this case, but maybe my logic to delay /AS to sync it actually causes the slow down in some case. Problem then becomes that I could simply reduce the DTACK delay on ROM access, but I already tried that and it didn't work :(

According to the datasheet for the 68000, The address is valid until about S1, just before /AS goes low again. So maybe on ROM access I could bypass my /AS delay logic.. but this would assume 2 consecutive ROM reads which I don't think can happen.. but lets try it anyway.. so on a valid ROM address the delays on /AS are bypassed...and no great suprise it just bombed out again. I also tried the /AS bypass for MFP access and other stuff without any luck.

So I am back to thinking that somehow the CPU is managing to run 2 instructions in a single 8mhz bus cycle. As ROM speed seems to be running at half the speed it should, I can only assume at some points the CPU reads ROM and does a RAM access right after. All within the normal 4x 8mhz clocks. The CPU can't really do 2 ROM reads, other than maybe in benchmarking, but even so instruction still have to run from RAM to run the loop. Its also possible the CPU isn't running instructions as fast as it should. I'm not sure exactly how instructions work which take several clock cycles, though maybe this is confusing my DTACK control logic somewhere along the line. Its hard to think of reasons for the slow downs.

What I am currenrly assuming is for int-div test, the CPU doesn't have /AS low during that time. So it must get the instruction from RAM, if it fits in a single bus cycle, like a simple ROM read, then it completes all within a single bus cycle. If its a ong division, then After fetching the instruction, the CPU must set /AS HI, do the internal work, then when its finished set /AS low again. So some instructions could in effect take 2 bus cycles to complete. Though as the CPU sets /AS HI then we actually end up with likely a few /AS delays which wouldn't be there with the V1.5 booster.

If a instruction takes 8 clock cycles (assumng 4 clocks for a simple instruction, or 1 bus cycle) and the CPU is running double speed, then that instruction would complete in just 1 normal bus cycle (though technically the CPU atually did 2 cycles). In anycase the instruction runs faster. There must be some speed up of instructions since the benchmarks show higher speeds on nearly all tests.

So back to thinking about the V1.5 booster. All I do there is when the GLUE access's ROM, I switch the CPU into 16MHz speeds. This means the CPU will read DTACK early and complete the cycle 50% faster. But I never thought much about what actually happens after that. So clearly there is more to be learned from my previous V1.X series boosters.

So next up I wired back in my V1.5 booster, running with CPU boost only and delaying /AS by 1x 16MHz clock cycle. Results were interesting as now they give the same results as my current circuit design. So next up is to delay another cycle to see what happens... OK so 2 delays and just get a white screen.. Adding 1 delay back in and enabling ROM CS and I get 147% on GEM WINDOW and 192% on int-div. Which should be 150% & 196%. So the slight delay in /AS causes the slight CPU slow down I am seeing ,but doesn't explain the huge slow down with ROM access. Which then makes me think that delaying AS based on DTACK might cause ROM access to slow down, which was why I wrote in some bypass logic before. Though time to re-visit those tests..

So going back to the fastest code.. it doesn't work now. So had to slow down ROM access to 3 delays again. Really don't understand how the code works one moment then a couple hours later fails to work.. anyway.. First test was to bypass the DTACK sync on ROM access, but keep the /AS sync in place.

So now with ROM CS bypassing /AS delay, results were identical before and after :-( I also tried MFP bypass with no luck either. I now wonder if the DTACK delay chain itself isn't resetting as it should as to clear it I currently clock CPU_AS though it. At 16mhz it could cause a delay, but not sure if it would really matter.. so time to try that...ASYNC reset just caused row of bombs on power up, so clearly something doesn't like the DTACK delays being forced to reset. Adding SR into the CPU_DTACK worked, but then it just really moves the DTACK delay from one point to another in the code. I tried reducing the ROM delay back to 2 clocks but still get a row of bombs.

Next up I removed the /AS delays totally so it just followed CPU_AS instead. Without the SR delays and AReset on the FF the code worked but speed was unaffected. I tried reducing the ROM delay to 2 clocks and again crashes.. I tried 1 clock and also bombs, also tried removing slow down on ROM all together.. and still bombs.. Tried increasing other DTACK delays.. still bombs.. tried speeding up DTACK.. still bombs.. Pretty much the entire code now is just 3x 16mhz clock delays on DTACK. I also tried removing ST_DTACK for when there is ROM access. Normally it is CPU_AS = DLY OR ST_DTACK. Though I AND'ed ST_DTACK with ROM_CS so the CPU should see DTACK low for longer on ROM access. oddly, it booted, but no difference in speed gain. I reduced ROM DTACK delay again just to be sure.. still bombs..So god knows whats up with it... I am still yet to solve the crash issue on turning off they keyboard aswell.

At this point I am really starting to wonder if running the CPU at full speed all the time is worth the trouble. The idea was to be able to sync the CPU at higher speeds to the bus easily. Though if doing that is going to cause slow downs, then it would need a higher speed just to compensate which makes higher speeds (if possible) seem unlikely. The STE booster can run 8/32MHz and can even run with a out of sync clock of 38Mhz. Every MHz gives a small speed boost. So I am thinking that would be a better technology to stick to.

 

January 9, 2017

I bounced some thoughts off Rodolphe and had another think about it all. I thought about the V1.5 booster that in actual fact, the CPU would spend a cycle or 2 at 8MHz during ROM access and wouldn't actually manage 2x16MHz bus cycles in 1x8MHz time frame. This then made me think that maybe ROM isn't actually running faster at all, but the instructions from ROM are. If a instruction took for arguments sake 4 clocks at 8mhz, then that is a normal bus cycle. If a instruction took 2 extra clocks (6 clocks) then it would likely need 2 full bus cycles worth of time to complete. Though if the CPU was running at 16MHz, then ROM access would still take the same amount of time, BUT it completes the 6x16MHz clocks well within the 4x8MHz clock time frame. So it save a full bus access which results in 50% speed boost on ROM access.

Anyway, I took soome scope images of the V1.5 booster and was suprised at the results actually...

 

What I see on the first image is that the CPU clock is actually in 16MHz all throughout ROM access. I thought originally there would be some 8MHz cycle time there, but seems not.

I also see just 32ns total decoding time for ROM access with the GLUE, I'm amazed! With the ROM chip speed being 55ns, then the total ROM access speed would be 32ns + 55ns = 87ns! If we assume the CPU does 2x16MHz cycles to read DTACK then the CPU wouldn't read the bus until about 120ns. So the ROM is stable and completes the transition so the CPU can read it at the full 16MHz speeds.

Looking at /DTACK on ROM access, /DTACK seems to follow ROM CE directly. If GLUE takes 32ns to do something, then it can been seen that when /AS goes high that 32ns later so does /DTACK & ROM CE.

On my V2 booster, ROM decoding is done via GAL logic and is only a few & faster. As there are 3 GAL's doing decoding and likely 7ns each, then 7ns x 3 = 21ns. So only about 10ns faster than GLUE decoding which is probably why there isn't much speed gain with GAL decoding ROM.

So now I still have a mystery to solve. Why does running the CPU at 16MHz all the time fail to give the speed boost on ROM access. My PLD decoding and delays are less than the V2 booster, so it should be running at at least the same speeds.. but it's not :-(

At this point I started to program a state machine for the timings, though for some reason it isn't behaving. It should be cycle accurate for the normal 8MHz bus timings, but doesn't even try to boot. So at some point I need to spend more time on that.

Currently I am confused over the ROM access speeds on my dev-board. I have 3x 16MHz delays there, but really they shouldn't be needed as proven the V1.5 booster runs perfectly happy at 16MHz without delays.

I tried various ideas to speed up ROM but nothing worked. So I moved to using the 32MHz clock instead of the 16MHz clock for the FF delay chain. I doubled the number of FF's so the delay should be the same as 16MHz and just got black screen. I then tried a extra 32MHz delay and got some crazy funky colours going on. Then I tried a second delay and got black screen again. I tried using a FF less just for the hell of it, and still no joy. The 32MHz clock generated by the STFM isn't to great. Seems to not spend much time low which could be reason why 32MHz clock isn't working.

So I played around with the 32MHz resistor which goes to my dev-board, 100R didn't work, but 50R made it boot and now I am running at 82% speed! At least thats something. So next up to lower the delay values again.. ROM access I had 3x16MHz clocks before as best, now the best is 3x32MHz clocks. Next to try general 16bit access delays.. So similar with that, needs 3x32mhz delays, was 3x16MHz delays before. So now 8bit access.. Was 2x16Mhz before, now has to be 3x32MHz delays to work. Then testing and GEM WINDOW still at 111%, so no change running 32MHz FF clocks :-(

It does seem that ROM access should be faster since now its 3x32MHz clocks (94ns) from 3x16Mz ( 188ns) . It's odd since 2x16MHz (125ns) crashed, when its actually in between the 94ns and 188ns times.

Looking back though my code I see I inverted the CPU clock, so I took the inversion out and now running slower than before at 107%. So I re-visited the delay tests. I managed to lower the ROM access to just 2x32MHz cycles, 16bit access to 2x32MHz cycles, 8bit access 1x32MHz delays. GEM WINDOW still at 111% dispite much lower delays now.

So I decided to bypass GLUE decoding and use the PLD decoding only. Basically the end result was I had to double the ROM delay from 2x32MHz clocks to 4x32MHz clocks. As the PLD is faster at decoding than the GLUE then a couple of 32MHz extra cycles would make sense. However I have not been able to gain anymore speed out of it overall.

 

January 10, 2017

I had a look a bit more into the 6800 (ACIA) a little more. I eventually tried holding off ROM cycles until VMA and VPA were high. At first it seemed to help and I could run ROM a clock faster and got 122% on GEM WINDOW. But it was short lived and seemed very unstable :-(

I had a idea to put ST_DTACK into the ROM select code, basically just making sure ST_DTACK isn't low before allowing the next ROM cycle. Now I got 2 cycle delay on ROM DTACK and now GEM WINDOW is at 129% it seems to crash at random during some test but at least something is happening now!

So now 1x32MHz delay , still somewhat unstable but now GEM WINDOW is 146%!! Odd thing is it bombs on power up, but pressing reset button makes it boot fine.. strange.. I tried increasing the other DTACK delays to 3x32mhz clocks instead of the 2 I had before to see if that made any difference and actually made things worse! So I reduced 16bit delays to 1x32mhz clock to see what would happen, and that made things worse. So I tried downing 8bit access to 1 cycle and made things worse again. So 2 clocks seems to be the best. Basically 1x16mhz cycle so I changed the code to use the 16MHz clock and 1 delay. Its possible the 32MHz clock itself is unstable so I was trying to avoid using it. Anyway, 1x16mhz clock didn't work, lets try 2x16mhz clocks.. It got to desktop and crashed loading GB4 :-\ So lets try 3 clocks.. nope no better.. So back to 32MHz clocks until I can figure out whats going on better.. OK so now thats not working either. arrgghhh!

I added back in some delay on ST_AS to see if that would help.. nope.. seem to have to press reset loads of times until it decides to boot.. odd.. I cleaned up the /AS code so it simply clocks though a 8MHz FF as the way I had it before was a little messy. The put it back to how it was as it no longer booted.

I lowered the 32MHz clock resistor to 33R, that made things worse (was 50R ) so upped to 68R to see if I could gain some stability. After pressing reset loads of times I managed to boot GB4 and finally got a screenshot..

Likely I will have to try and get the circuit working without the 32MHz clock, or build a small clock buffer board to beaf the clock up a bit..

 

The left image is the 8/16 switching (V1.5 booster) and on the right is the constant 16MHz test. Still falling short by a few % though :-(

Graned now its unstable on boot, I tried slowing down general 16bit access by 1 clock and GEM WINDOW gave 144%. I found that interesting as 2 delays gave 146%. On that basic if I could run with zero delays it might just give me back my few missing %, but so far it just refuses to run any faster.

Then I tried slowing 8bit access to see what would happen.. No change.. so I upped it again.. no change.. odd.. So then I lowered 16bit again and crashed as it got to desktop.

Next up was to replace the 32MHZ clock with the 8MHz one to see if I could get it synced to that one instead. 1x8MHz delay crashed on desktop, so tried 2 clocks and that worked. So out with the 32MHz line finally! Still don't seem to be very stable though :-( I next tried syncing to the 8MHz falling edge to see if that changed anything. I managed to get all delays down to 1 clock, but likely now it simply waits the same time as it did before with a higher delay time on the rise edge of the clock.

I finally un-forgot to try GB6 and it still crashes, so clearly still something amiss yet to figure out there! :-(

I have seen some random bit errors on boot when it crashes on power up. So decided to run RAM test on it to see if that could find anything. Its possible there could be a bad connection somewhere on my board causing the issues. So thought I would leave it on RAM test for a while to see what happens.. and all seemed good.

 

 

January 19, 2017

I went back to my "state machine" code and decided to run the CPU at the stock 8MHz to make it easier to check the timings are right. My second attempt was involving just holding off DTACK until CPU enters CPU state 4. Also the bus cycle isn't allowed to start until DTACK is seen high.

The problem with waiting for DTACK high is normally the next bus cycle starts while DTACK is actually low still. DTACK can arrive anywhere between S0 and S2, and of course the CPU starts on S0. At 8MHz this isn't a problem, though the pause needs to be there when running the CPU faster.

Using GB4, I got overall 79% speeds. Int-div gave 98%. ROM access 77%, RAM access 69%. So there is a lot of "bleed though" on DTACK when the CPU starts state 0. Unfortunatly this isn't so easy to solve. I can replace DTACK with onboard ROM decoding, but of course I cannot replace the MMU's DTACK signal. The MMU likely does not release DTACK until S1 or S2. When running at higher CPU speeds, we simply have no choice but to wait until DTACK goes high.

As mentioned before with 8/16MHz switching, the CPU starts the next cycle a fraction early, but the CPU doesn't look for DTACK until much later, so there is no issue there. So the logic for running the CPU at 16MHz all the time is actually going to be slower than 8/16MHz switching. As already noted, currently my previous code gave a 5% drop in speed. So next up is to run at 16MHz again and see if my "state machine" copes or not...

 

 

January 20, 2017

Trying 16MHz resulted in a black screen, so something failed again :-( It doesn't make any sense since the code works at 8MHz and shows it must be delaying DTACK as the GB4 results are slower. So there shouldn't be any reason now why 16MHz shouldn't work since all my previous code did was to delay DTACK by 2 clock cycles anyway. So my current code should be be doing the same.

I reduced the DTACK delay to state 3 from state 4 to see if I could recover the speed loss at 8MHz to see if that gave some clues. It could be as I have to wait for DTACK to go high, its already starting 2 states to late, so state 4 would actually be state 6. Interestingly, RAM access is not 100% but ROM access is 99% along with most tests. At this point I wonder if /AS is going low to fast and not allowing the GLUE chip to release DTACK at the end of a ROM access cycle. I have seen DTACK go high when /AS goes high (or about 40ns or so later). Though if the CPU sets /AS low again to soon, then I guess its possible DTACK will never be "seen" going high and GLUE will bascially keep DTACK low constantly.. So a single 8mhz clock cycle on /AS might be a good idea for the moment...

Now ST_AS is clocked at 8MHz. This should give the GLUE time to set DTACK high to use as a referance to the start of the next bus cycle. The state start also now uses the ST_AS, so the CPU cycle cannot start until ST_AS is set low. RAM access slowed to 53% and ROM access slowed to 63%. So while this is bad, it shows the delaying isn't causing the machine to become unstable. Trying it at 16MHz again with the /AS delay and still no joy :-( I do see some DTACK activity on power up, but just get a black screen.

So my previous code which booted eventually but gave row of bombs on reset works, but fails on GB6 due to what seems to be ACIA write issues. My assumption there was that while DTACK is delayed, the CPU might start a cycle to early and that is the reason for the odd behaviour. It was my thoughts to hold off allowing a bus cycle to start until DTACK goes high, but obviosly either my code doesn't work as expected, or there is some other odd problem I am yet to find. If anything, my current code should at least equal my previous code.. So looks like theres a mystery to solve now with 2 branches of code :-(

As mentioned before, I think running the CPU at a constant speed has no advantages in a 68000 system. Likely the only possible benifit would be with a CPU like the 030 with caches where the constant speed may yeild higher speed boosts. Though according to the datasheets, Cache operations are done with when /AS is high anyway, so I am not even sure at this point even if there would be any advantages in that. 32MHz on the STFM likely fails due to DTACK being read twice, so I still have to solve this DTACK timing problem regardless of if the CPU runs at a constant speed or not :-(

 

January 23, 2017

I had another re-think about this problem and thought there could be a issue with VMA/VPA completing the cycle faster, but not so much the ACIA not being cable to double clock the data, but the fact that the CPU would actually start a "out of sync" bus cycle. Basically meaning the CPU will start a bus cycle 50% sooner than it should, and likely clashes with some other MMU related access.

So I slowed down VMA by 2x8mhz clocks and the bombs on startup vanished! Was shocked as couldn't work that out for ages! GB6 still failed to work. So just for the hell of it, I slowed down VPA by 2x8mhz clocks and OMG time as it for the first time ran GB6 without crashing!! FINALLY!!!!

So now I know where things are tripping up. Currently the ACIA are being clocked twice as fast, but they seem to be able to cope with the higher speeds. Possible they could run higher but at least the system is now stable and I know the reason why the thing has been acting up for ages!

 

Above is the current GB6 results. I will have to benchmark the V2 booster with the new build of GB6 since the one listed on my V2.2 booster page was with a older build. Though in any case, I think its going to be pretty darn close, but also as mentioned before, I doubt this type of booster design can reach the 8/16 switching benchmarks, but hell, its only about 5% anyway. This doesn't matter anyway since now the CPU can run at higher speeds, just 1MHz extra will bump up the results anyway.

I tried a 16MHz oscilator and thankfully that worked! All my code syncs to the 8MHz clock , so the CPU speed should be irrelevant now. It can now run on its own clock which is out of sync to the rest of the system and finally after 2 years of faffing about its for the first time seen light!

I tried a 20MHz osc and it booted to desktop but the keyboard went a little nuts. So I was unable to run GB6. Though this doesn't matter, it just means the ACIA are maxed out when running at 16MHz. What I need to do now is work out a proper bus emulation of the ACIA bus cycles. Once that is sorted out, then the CPU should be free to overclock like crazy!

I am pretty sure the 68000 can run about 38MHz, it does on the STE booster. Though it becomes unstable after that. It will be interesting to see what it maxes out at on this dev-board. I might hook up my clock gen board at some point and just see if it can get to desktop at least... Actually I found a 25MHz osc and it gets to desktop but with corruption. So something trips up. It could need the delays looking at but thats a problem for another day :)

Next up I need to work on the 6850 access.. Then I can look into overclocking more. 32MHz should be possible (it works on the STE) .. Its possible once the 6850 code is done, it may behave better anyway.. time for some sleep.... ZZZZZZZZZZZZZzzzzzzzzzzzzz...........

 

January 24, 2017

Been trying all day to get the E-clock generator to work without much luck. I had a email from Rodolphe explaining how to emulate E-clock, though my attempts even using his code example have all but worked :-(

It seems when the CPU see's VPA is automatically terminates the bus cycle after a E-clock clock cycle. So it makes it impossible to transfer data to or from the ACIA. The "fix" for this comes from the 020 CPU code as it does not even have those VPA VMA pins so they have to be emulated. Idea in the ST using the 68000 case, that VPA marks the start of the cycle, but VPA never reaches the CPU (its isolated and held high). VMA basically goes low a couple clocks later and the E-clock is emulated with a decade counter. From the ACIA point of view the bus cycle should appear as normal. The only change now is we need to generate DTACK to terminate the CPU cycle. Overall this obviously works with the 020 CPU, so I am stumped as to why its failing to work on my stuff :-(

The 6850's in my ST are the original ones it seems. The 68B50 should work double the speed, though I suspect they will still fail around 30MHz anyway. So emulation of the E-clock is needed to ensure the speed to the ACIA remains one it can cope with.

Mostly the E-clock is generated from the 8MHz system clock. Though as even the older 6850's can run at 16MHz CPU speeds, then I may ultimately use a 16MHz clock to speed up ACIA access. It may not give any speed increases, but halving the ACIA access time can only be a good thing I think.

 

 

January 25, 2017

Today I gave up with the E-clock emulation code and re-wrote the thing with counters to match the timings in the 6850 datasheet. This time I seemed to get stuck in some loop after reset where there was no bombs just a white screen. Though I saw constant activity on the VPA and DTACK lines and E-clock timings looked good aswell. So no clue why thats not working.

Later I went back to my original code which had been working fine with GB6, but a few days later and now it bombs when tests start with GB6 WTF ???? I ran the code at 8MHz and it worked. I tried changing various parts of the code to no avail. I can only think that the 6850's were borderline working the other day and 16MHz clock speeds was just a fluke that it worked. Really I should change them for the faster 68B50 but can't be bothered at the moment.

Really my emulated ACIA access code should be working. Though I am wondering if the 6850 datasheet isn't explaining very well how data is latched. It looks like E-clock is low to program IC registers, but data isn't latched until E-clock goes high. Which is what I do. Though there is a bunch of timings right at the end where E-clock goes low again, so actually now confused on how its supposed to be working. I see on my scope the timings are within the basic 6850 timings. So not sure why I am stuck in some odd loop.

Looking at the /AS and VPA timings, I notice /AS is going low again while VPA is still rising. VPA actually takes about 400ns to rise, so wondering if the CPU is starting the next cycle before VPA has fully gone high and its causing some odd conflict with the GLUE. I added 1K pullup on VPA to speed it up. Looks about 2 volts while /AS is HI so should be OK, though didn't seem to help.

I changed the E-clock timings to 1,000ns HI then 1,000ns low. According to the datasheet 500ns should be fine, but it didn't seem to help.

At the end of the cycle I also see DTACK issued which terminates the CPU cycle, so as far as I can see , everything is as it should be, assuming I understood the timings correctly that is.

Considering my previous code which was working before (but now isn't), I had to delay VPA and VMA by 2 clock cycles. I tried 1-3 delays on both in various combinations but it did not help. I was thinking the CPU would be starting the next bus cycle to early and doing a out of sync bus access. Basically like delaying /AS by 2 clock cycles, that doesn't work. So was thinking that once the ACIA access was done, it would be done 50% faster and the next bus cycle would start actually 2 clocks later, and cause the crash. So my next thought was to delay /AS by 2 clocks cycles after a VPA access which should basically bring it back to where its supposed to be. Though doing that didn't help either :-(

So I went back to my previous code which booted up fine, just crashed on GB6 tests and did the same code changes to see what happens. Odd thing is that now it bombs out without booting.. Interesting.. I took out the code changes to to make sure I hadn't broke anything as well.. And it booted as expected. Trying the previous code it seems to make the floppy drive reset but now 4 bombs... So I then delayed 4 clocks to see what happens.. Which basically gave the same behavior :-(

I checked the CPU E Clock and it was about 200ns high and about 400ns low. The 6850 datasheet says cycles must be at least 450ns so it could just be the ACIA (6850) is tripping up at the higher input clock speeds. So I guess I need to go hunting for some 68B50's and change them to see if that solves the problem with that code or not :-\ Problem being while I brought some 68B50's , it was about 2 years ago and no idea where I put them :-\ and of course the UK prices are stupid money and the cheaper source is China which could take weeks to get here :-( So I found some 14MHz oscillators on a site online so brought a couple of those. I'm thinking a extra few ns delays with the slightly slower source clock might be enough to trip it back into working again.. So will see what happens when they arrive..

I had a thought to remove my VPA & VMA delays and try delaying /AS instead. My first test I just removed VPA delay and now GB6 is running ?????? again WTF! I had to add the delay as it didn't work originally, now it doesn't work again, and I remove it, and it now works again.

I decided to try 25MHz and it crashed when getting to desktop. So I increased the DTACK delays and it worked in 25MHz, but then halved RAM access speeds. I did a lot of tests and there is a very fine line between being unstable and slowing down the system.

I had !CLK8 and CLK8 as my delay timer, but this gave 97% RAM access. I could run with simply !CLK8 in 16mhz mode. So half a CLK8 cycle is about 62ns. At that that point things start to slow down. Without the 62ns delay, things are not stable at 25MHz (note I was unable to run any program as the keyboard isn't yet working over 16MHz speeds). So the in between 0 and 62ns is about 30ns. Really I needed a 16MHz clock to do that, but as its no longer connected I just used the CPU_CLK. 25MHz cycle is 40ns.. So close enough. 25MHz booted to desktop, so that is good. Back to 16MHz with those delays and that 40ns would increase to 62ns again. So I expected a slow down still...Though thankfully it still gave 100% RAM speed. So all is good in the world.

The problem with using the CPU_CLK as the delay is that at some point when the CPU is running faster, maybe 50mhz, then the delay is then 20ns. While the CPU_CLK is good enough for testing currently, its possible in the future the 16MHz system clock will have to be used so the extra delay is constant at about 30ns. Will worry about that another day though :) I will see if I can order a 32MHz osc and see if I can still get to desktop with that.

 

January 26, 2017

I've been talking to Rodolphe about the timings again and hes explained some things, but still I am unable to figure it out and he is confused as I am.

I have noticed though that the GLUE has some screwy timing issues. I found GLUE was setting VPA, but only keeping it low for about 500ns. Then it did another 500ns cycle, then another. So basically ended up with a square wave on VPA, which is bad. After a whole day of debugging, I found that if VMA isn't issued fast enough, then the GLUE seems to get confused and seems to re-start the cycle. The code I am using to emulate the E-clock and VMA timings clearly has a bug somewhere. Though what I did was just to document working timings, and just add a chain of flipflops to delay VMA and set it where it should be. As far as I can see now, everything is identical to proper timings, and yet, it still doesn't boot :-(

I've tried so many things it would take a ages to list them all. Though I suspect there is some timing error between bus cycles for some reason. I did try delaying ST_AS a few clocks after a VPA cycle to delay the next cycle start, but this didn't help. I was also thinking that really, this is just the keyboard data going on here, so even if there was bad data coming though, it shouldn't prevent the machine from booting.

I have noticed some sharp pulses on the ROM chip, likely as the bus settles the ROM chips might be activated when it shouldn't. So I cleaned up that by making ROM_CS line go via a flipflop. It shouldn't slow the ROM down, but currently its hard to tell as the machine doesn't boot anyway. But for now I think its a good thing to add. This cleaned up the stock 8MHz timings nicely, though on my buggy code, it failed to boot at all :-( So I just used the delay on the ROM_CS pin directly (ROM_CS was also used in some other code places) and it seemed to clean up the ROM glitches, but machine still bombed on boot.

One thing I don't really get is why ROM would be read directly after a ACIA ready. That doesn't make any sense. With a stock setup, There is a bus cycle, but its not a ROM cycle. So assume once it reads the ACIA it would normally write the data into RAM. Though for some odd reason it doesn't happen on a faster setup.

I went back to my code which just overclocks the ACIA (E-clock) and took out the VPA and VMA delays and now it booted and ran GB6 just fine. So the delays are not needed now. When before they was ?! I really don't understand why one day something works and the next day it doesn't. Currently the CPU is running double speed with the ACIA's. Works ok at 16MHz. But now doesn't shed any light on why the emulated ACIA access is bombing out :-(

 

January 27, 2017

Not much progress today either. I took a look at the PAK030 code and saw in its E-clock generation it has FC0 and FC1 in there, not really sure why, or if its important, but I never routed those pins to my PLD, so have to hope they are not actually needed for some reason. I did see VMA is cycled to the 8mhz clock, again, possible it could be important, but with so many code changes altering the timings of the cycle, I don't think that is important either. Looking at the booster020 code, similar thing, doesn't seem to be anything special going on there either.

Its a little odd though as sometimes when the bombs come up (about 20 of them) they are drawn line by line very slowly and have black lines under the bombs to the bottom of the screen. Once the bombs are drawn a few seconds later the machine resets and the cycle starts again. Going back to the VPA cycle being the issue, that in itself is odd as I can see lots of bus access after the VPA cycle, like the ROM reads. So it doesn't even crash after the VPA cycle, theres a lot of other stuff going on before it does crash. Seems a bit of a impossible mystery :-(

 

 

January 29, 2017

I had a thought to try the diagnostic cartridge to see what that made of things and half the time it came up with this..

I7 bus error, it seems to come up with that on a working setup aswell, so not sure whats going on there. But mostly I get the E9 error, sometimes the EB error.

Christian Zietz was kind enough to debug the address in the diagnostic cartridge for me and he mentions..

The code around the failing program counter is:

ROM:00FA2C02 move.b ($FFFFFC00).w,d0
ROM:00FA2C06 move.b ($FFFFFC02).w,d0

These are read accesses to the keyboard ACIA, which BTW uses the E-clock. However, note that not only the program counter but also the failing access address is 0xFA2C08, so the bus error occurs during an instruction fetch. I.e. the CPU fails to read code from the cartridge right after an ACIA access.

So the CPU fails to read ROM after a ACIA read.. Considering it all works fine with the CPU E-clock then it can't be the ROM decoding fails. I see DTACK being issued at the end of the ACIA cycle and even more odd is that I do not actually see the bus error pin on the CPU going low. So its a "catch-22" situation. There is a bus error, but as everything seems to look correct timings wise and there isn't a bus error on the CPU.. how can it fail and cause a bus error ?!

Worst case is the CPU reads random bus data if the timings are wrong. The ACIA cycle terminates with DTACK so everything should be happy. I tried various DTACK timings ran from the CPU clock (16mhz) or the 8mhz clock, basically from around 50-200ns of DTACK low time. My PLD design prevents the next bus cycle from happening while DTACK is still low from the previous cycle anyway. So DTACK low time is basically irrelevant. The CPU has plenty of time there.

What I do see is 3 ROM reads directly after a ACIA cycle on my failing code. Though on working code, there is some other cycle (not a ROM access) so could be likely RAM access cycle. Then the ROM read cycles. So its like I am missing a bus cycle somewhere.

I also tried having DTACK follow VPA signal just to complete the cycle without actually reading the ACIA to see what that does... Which didn't work (just random garbage on the screen), so added in VMA to follow VPA as well and the diagnostic cartridge booted but without the EB error this time. But still E9 error.

I am half thinking (again) that maybe there should be some delay after a ACIA read, basically to wait for the ACIA to finish with the bus as there could be a bus conflict going on. I made some code changes and made ST_VMA high all the time, I would then assume the ACIA would never get activated (enabled) but same result. E-clock might also be causing issues, so I changed it to low all the time (where E is active high for some odd reason on the 6850's ?!) So should doubly make sure the ACIA is never selected, and still the same results :-(

So at this point the ACIA never gets chance to access the bus. The cycle would simply read random data from the bus but still should complete the cycle and run fine, but it does not.

I thought I would keep DTACK low for a lot longer to see what happens. 4 & 8 clock lows, same results also. I did not check the code was working on my scope as no time tonight to double check. Though I did also try adding in 4 and 8 clock delays on next bus cycle after ACIA access. Which also didn't help. Though that test only held off ST bus cycles, not the ROM cycles.

So I tried delaying ROM DTACK by 8 clock cycles, and wouldn't boot at all again. I just added a 4 clock delay on ROM after VPA, but not sure if the code just delays after VPA or all the time. But for a quick test good enough.. Still no boot.

Next I thought I would try a step back and let GLUE decode the ROM and issue DTACK (technically the PLD decodes ROM, but GLUE doesn't know about that so won't care). It got back to the errors on cartridge boot but this time the I7 error is missing and I am just left with E9 bad instruction fetch. OK, so stuff ROM decoding and let GLUE also enable the ROM...and still the same E9 error. AARRGGHHH!

So glue is decoding ROM as normal, the ACIA's are disabled.. The ACIA cycle should complete (even though it will read garbage) and yet still E9 error :-(

I decided to go right back to the start with stock code and simply bypass the ACIA cycle to see what happens. Basically I let VMA=VPA and issue DTACK on VPA also. This gave the same E9 error.. Strange to say the least!

I tried disabling the ACIA chips totally by holding VMA high and E low. E9 again... I set E-clock HI for the hell of it, E9... I tried VMA low all the time, E9, tried HI & LO on E-clock and.. E9..

I'm basically thinking while the 020 works with emulating the VPA cycles, and likely the 68SEC000 does aswell (its also missing the VPA pins like the 020 CPU) that those CPU's are happy to run that way, but the stock 68000 isn't happy for some reason.

As to why the T25 booster works at 25MHz with the keyboard is a huge unknown. My T25 died some time ago so can't test, but I do know E-clock is slowed right down, but I wonder if they cheated and used the HALT pin on the CPU to stall the CPU during VPA cycles and they just complete as normal. Currently its my only idea to try next.

 

February 2, 2017

Still trying with E-clock low and VMA HI. Basically just disables the ACIA totally. The only thing I do is complete the bus cycle by issuing DTACK.

Using the E-clock from the CPU, if I disable VMA, then the diagnostic cartridge says "keyboard not responding" which is correct. Though I never see that with my code. I just see bad instruction. So I think the CPU never even completes the ACIA cycle. Even though I clearly see VPA and DTACK being issued.

So I tried a chain of 20 FF's clocked at 8MHz to see if there was some timing problem on the bus. I first tried to issue DTACK on a rising edge of the clock, then the low edge, but neither helped. With 20 FF's in play I have basically tried 0ns to 2,500ns in delays. To basically emulate the slow ACIA access speeds. Neither helped.

It would seem there is some conflict on the bus, but with the ACIA disabled, and BGACK always high (at least mostly) there doesn't seem to be anything else using the bus even. So I do not understand where such a bad instruction can come from. I will have to have a think about this, but I have already tried everything I can think of to the extreme. So no clue where to progress from here on.

 

February 16, 2017

I thought I would use the E-clock code from the simple "jano" 020 booster as I converted the code over to CUPL, it worked on the 020 board, so code conversion was good. I patched it to work on the 68000 and still no luck :-( Just the same row of bombs. The code otherwise was all stock 8MHz stuff. It just seems like this "idea" just doesn't work with the 68000.

As mentioned before the T25 booster works, but as mine died a while ago I can't check on this. My only thought is the VPA conversion to a normal bus cycle doesn't work on the 68000. The only possible work around for this that I can think of, is to emulate the correct E-clock timings, but complete the cycle as normal, but delay VPA from reaching the CPU until the emulated cycle is completed. Then the CPU is allowed to see VPA and it reads the bus super quick and completes the VPA cycle as normal. This will also mean the CPU_VMA would have to be isolated from the ST bus so it doesn't try to double read.

 

 

February 20, 2017

I contacted Holger Zimmermann (PAK creator) about this issue and he was kind enough to reply. He said.

Did you consider that VPA is used for autovector interrupt as well? In this case you need to forward the VPA signal to the CPU, there is no other way start an autovector cycle on the 68000! (68020/30 processors are using a separate input /AVEC for this purpose).

At this point while the T25 might have done a workaround for this issue, its time to move on from this failed idea. The good news is there is a 68SEC000 CPU which is similar to how the 020 is wired up, in that VPA cycles and E-clock are emulated. This is all tried and tested code for the 020. So my original VPA emulation code should work fine with some small changes on the SEC CPU.

According to what I read on the Amiga threads, they claim 50-100MHz is possible with the SEC CPU. If that is true or not, no idea. Though I think the SEC is likely a later production run over the 68HC000 so it should overclock better. We shall see.

I need to look at ROM's again as I am using 55ns ROMs which are about as fast as I could find in 5V, but that struggles to keep up with 32MHz speeds. So if I can find a faster ROM, then higher ROM speeds can be done, otherwise , regardless of CPU speed, the effective ROM speed might be maxed out at about 40MHz. I'd rather not get into adding waitsates for stuff, but this depends on what parts are obtainable.

Currently I am 95% finished in creating a new schematic for a new development board to use the SEC CPU. This will be similar to my original board (as shown at the top of the page) with the exception of the CPU will be the SEC type and be a SMT part. The PLD needs some tweaking to route some more signals, but I hope the board won't take to long to finish totally.

 

 

February 27, 2017

After trying to route the new board on 4 layers I gave up. It would need to be on 6 layers like the V2 booster, but that is some huge expense just to get one board made. The board is pretty large as well which doesn't help with PCB costs. So what I have decided to do is carry on using my current dev-board, but building a PLCC to SMT adapter like below.

The pinnout is almost the same so routing wasn't to bad. Of course there AVEC pin and FC pins have to be routed to my PLD as those are not wired up. I had some free pins, or at least pins reserved for SRAM expansion, but those pins will have to be routed to the SEC CPU. This way I can still test out the max speed of the CPU before comitting to a new PCB design.

As said before , the maxium I got the HC CPU to was 38MHz, but to make the SEC worthwhile, it really needs to get up to 64MHz or more. If this CPU is only going to give a few MHz more than the HC, then I think it wouldn't really be worth breaking past 32MHz speeds.

 

 

March 21, 2017

Adapter PCB's came. Here the SMT 68SEC000 will fit nicely in the PLCC socket. AVEC & FC0,2,1 will be routed to the PLD. This board will fit my dev-board (image right at top of page).

I don't know when I will have time to solder this up and start coding. Could be several weeks as just got to much work on lately :(

 

May 25, 2017

Decided to do a V1 STE booster which is plug-in. This is the prototype board (yet to be built) This one has a small area on bottom left to re-investigate overclocking.

I have also started routing SRAM onto the next prototype.

Here holds 32MHz Fast-Ram.

 

June 15, 2017

Not much to report again. Other than still a lot of things WIP. Mostly I have been toying with various PLDs. Though Currently designed a GAL to Altera adapter board to play with altera PLD's a little more.

I have been having a lot of thoughts lately about various design issues with the STE booster. For the first part, a few months back I designed a GAL to Atmel adapter board. It never got built (just not had time). Though while the Atmel is just a huge GAL anyway (basically loads more IO ports) I was designing the next gen STE booster to have fast-ram. Though the best SRAM to use for this project was 3.3V. So I was adding in IO translators to adapt 3.3V <> 5V. Nothing really wrong there. Though it does add more cost in PCB space, IC's, soldering, testing etc, and I just didn't want to do that.

I know Xilinx is popualr and a lot of people use it. It has inbuilt translators which make it ideal. So I installed the free webpack (all 20GB of it!!) and fiddled with the diagram editor. It was pretty straight forward. Actually looked like the Altera software I am using. Though I couldn't for the life of me work out how to assign pinouts to the diagram to actual IC.

After watching god knows how many videos, looking at various webpages I basically gave up. Turns it it was hidden in a bunch of places and when I finally got the pin editor up, I couldn't work out how to assign pins. Where later it turned out they are drag and drop. It actually seems like nobody has even used the diagram part of the IDE so finding help to use it seems impossible. That is pretty much the problem with coding on the Atmel chip, just impossible to figure out easily how to do anything. In the end, I basically gave up with Xlinx IDE. I just find it very annoying and frustrating to use :(

So I decided to go with the Altera MAX7000 series for the moment. I already designed the IDE prototype with it. Though I would also like to try to get the STE booster powered by a Altera PLD. This means my Atmel code on all my previous boosters becomes useless as I will be working with logic chips in the Altera software. So I am having to basically re-invent my own stuff again.

Why would I do that ? Well, as said in blog post somewhere, I am tired of trying to code stuff where I don't know if what I am doing is right, or my "design" simply doesn't work. I mean don't get me wrong, I really like WinCupl and the Atmel PLD's. Though anything past basic IO stuff becomes a nightmare to figure out. Atmel have been helpful in answering my questions. Though often its waiting a day or 2 for a reply. Then the "conversation" can go on for months.. Its just not practical to continue like that. I also have the same feeling with the xlinx stuff.

Though the Altera software, I just started using and got along with it just fine. I mean, I wanted to assign my pins, so I select pin assignments from then menu, what could be easier ?! Saved me 2+ hours of looking on various sites for answers on how to do basic things.

People may think I am mad, though I just prefer to draw circuits than code stuff. I mean its like trying to write a book in ancient Egyptian when you don't have any sort of reference book on it all. Its just madness to try.

Of course, I never used Altera before, but I could just start designing easily and generally just get on with it. Also the circuits I can design in my logic simulator software which I have been using for years. Design and debug, then just copy the circuit over to Altera and program the chip. I mean its a lot easier than fighting with code and syntax of it all. I just feel the Altera is the right move for me to make. So this new PCB is the first step in that direction.

I should be able to translate my previous booster code back over to a schematic in the Altera software. Then see how it behaves in the STE booster board. Once I am happy, then I will continue work on the fast-ram side of things as I can route everything though the Altera chip :)

 

 

September 19, 2017

A lot of projects basically being put on hold due to lack of time. But I have started working on the STFM 32 MHz booster and made little progress finally.

I actually wasn't planning to produce the V2 booster again. Though a chap said he is interested in them to mod machines with. Though as they are "out of stock" now it caused a bit of a problem. Firstly, a batch of 25-50 PCBs of 6 layers isn't cheap. The production run would be well over £1,000. So it just wasn't viable to produce them considering sales are generally slow with boosters.

I had already spent some time tweaking the V2 design to use just 2 GALs instead of 1. Though in the end, I found it just couldn't be done. Just not enough IO pins :(

All 3 GAL's are basically chained in series for the logic operations, which doesn't help with trying to push into 32MHz speeds. In theory, it should run at 32mhz, but even the 32MHz drive on the STFM is so bad, it can only just drive the shifter, nevermind a extra gate in a PLD.

The STE had a buffer to do that job. So I made 2 previsions with this design. Buffering the 32MHz clock by the shifter, or just using a 32MHz oscilator. The fallback is just using 16MHz.

Now while the STE booster works at 32MHz, the STFM has more screwy timings to contend with. This makes 32MHz more of a problem. I stopped work on this project for a long time as I just didn't have time (or really any interest at the moment) to finish the beta board off. Though now it has just gone off to fab.

This new board uses a large Atmel PLD. I have used it before on my dev-system board. Though even that I have not been able to run 32MHz. Though I made some more previsions with this new V2 design (dubbed V2.5) to run /DTACK via the PLD so I can stall the CPU a little and hopefully this will get 32MHz working. Though as time is a factor. I don't know when I will get time to write the new code or try it out. Though in theory, getting it running as 16MHz shouldn't take to long. Assuming I've not screwed up anything in the mad rush to complete the design.

The boards can be made in batches of 3 now. So while I don't have to spend £1,000+ on a new batch, the price per PCB is a lot higher. This means the booster end price will be a fair bit higher. Though I am not even sure if I will sell these myself for my store or not yet.

There was some previsions also to adapt the booster logic for software switching. I did some draft designs several months ago and started to route the PLD logic. Though that is as far as it got. I can't even remember how I was planning to go about it now. But its possible firmware updates for the V2.5 could be done to add features or "fixes" for 32MHz speeds in the future (if solvable).

As for the software switching. I think there was plans to have my own "exxos register" somewhere to allow the booster to be turned on or off. Really this is just a bypass for the toggle switch, where that bit is mapped to a register in the memory map somewhere. A latch in the PLD and some logic to route to the switching code shouldn't really be a problem. I actually routed 8 data bits to the PLD for maybe other use such as ROM switching.

One thing to bare in mind, is settings are not stored anywhere. So once the machine is powered down, the defaults would be selected again. Though I do not have any plans to do another V2 design, as the STE booster should have similar logic and the settings will be stored in the RTC NVRAM. Though it is likely a long way off before that project is completed.

Overall, this V2.5 booster is a clone of the V2.2 booster, only using a more advanced PLD with a lot more IO power for various "possible" features at a later date.

More information can be found on my forum link below. I had a lot of issues with the 8 MHz clock line, but now seem to be almost resolved.

https://www.exxoshost.co.uk/forum/viewtopic.php?f=7&t=45

 

 

 

HOME