Flaky Serial Link With V4

 
#1

DJ I hope you can help? I am experiencing major problems with the reliability of the serial link that concerns me a lot, as it stands the problem is bad enough to undermine all my work (over the last year) on utilizing the v4 for the ALTAIR robots.

The new ALTAIR heads have multiple PICs that all report back to a Master PIC which itself communicates with the v4. On testing I keep noticing the v4 drops out randomly and also breaks all serial comms at random times. In an attempt to discover the problem, I shut down all the interrupts buzzing around the PIC network, so that the Master was just (simply) communicating with the v4 with no external interrupts. I then ran some simple code that just sends 3 bytes to the v4 (every 100mS) then increments the bytes and so on. At the v4 end it just looks for the 3 bytes in the buffer reads them and increment a $times counter. What happens is that the v4 just breaks serial comms or has byte read errors at random times.

When the fault occurs serial comms completely stop and it is necessary to stop the script and start it again. I have found this happens when I put a static "3" in the UartAvailable line (for example if UartAvailable(0,0) = 3), but this should not be a problem as the v4 is only ever going to get an isolated packet of 3 bytes? To get around this if I change the static "3" to ">= 2" (for example if UartAvailable(0,0) >= 2) then the comms does not stop or error (so much), but of course this means that a packet misread has occurred and the buffer has accepted a 2 byte packet which means we have lost a byte.

From my tests, in general it looks like the bytes available error occurs once in roughly 100 packet sends and a byte error (data byte read incorrectly) occurs around once every 250 packet sends. Below is a screen dump from one test which shows the errors from just over 1600 packet sends.

User-inserted image

Unreliable serial is a major setback for me as all my robots use serial linked subsystem PIC modes so these dropouts where serial links can also just stop is a complete disaster for my robot designs!

Tony

#2

Share your code - is there a delay in the loop? Sounds like the data channel is flooded

#3

DJ, there is a 100/10mS delay in the loop, so their should be no channel flooding. I did another long test run and get 220 available errors in 16000 packet sends - this time there was no disconnection. I have just done another test and got 11 available errors in 536 packet sends then the v4 disconnected?

I hope it is me doing something wrong as I badly want this all to work! Here is my code its a bit messy as I have been trying a lot of things to make all this work. I can try some CRC and error checking routines in the master PIC, but I would really prefer for it to work efficiently without the need for these.

Tony

Code:


:MAIN_LOOP

# bytes in from master PIC
# only process data if the correct packet size is available
$available = (UartAvailable(0,0))
#if (UartAvailable(0,0)>= 2)
if ($available = 0)
goto (MAIN_LOOP)
endif

if ($available != 3)
$error++
print ("available error"Winky
print ($available)
print ($error)
endif

if ($available >= 2)
# a valid packet is ready - read the packet
UARTReadBinary(0,0,3,$inputdata)
$byte1_in = $inputdata[0]
$byte2_in = $inputdata[1]
$byte3_in = $inputdata[2] # 8 thermal pixel data
$times++
$new_data = 0

print($byte1_in +":"+ $byte2_in +":"+ $byte3_in)
if ($last_byte1 != $byte1_in or $last_byte2 != $byte2_in or $last_byte3 != $byte3_in)
# latest data packet is different to last
$new_data = 1
print("new data")
$last_byte1 = $byte1_in
$last_byte2 = $byte2_in
$last_byte3 = $byte3_in
endif

else
$times2++
sleep(100)
endif

#print($byte1_in +":"+ $byte2_in +":"+ $byte3_in)
sleep (10)
GOTO(MAIN_LOOP)

#4

I did an extensive test with the serial connection to the v4 with the EZ:1 head PIC network switched back on (sending live data packets) - I ran it for a few hours and got 157 available errors in 18500 packet sends, the good news is there was no disconnection this time. Thanks DJ in advance for anything you can advise here.

With the EZ:1 head open, it feels like I am doing a bit of brain surgery on the poor bot!

User-inserted image

Tony

#5

Hi Jeremie, thanks for looking into this for me - to me its starting to look like its the old latency issue again, here is an earlier thread on this.

www.ez-robot.com/Community/Forum/Thread?threadId=8067

It looks like that if you need a rock steady reading of say an external digital line it needs a 500mS delay and continuous reading of a serial input seems to need to a huge 1000mS (or higher) delay between packet reads to start being a reliable link. Less time delays will work but read errors will increase.

The following are delays added (between packet reads) into the master PICs main loop communicating with the v4

100mS delay in main loop gives 140 available errors in 500 sends
500mS delay in main loop gives 40 available errors in 500 sends
1000mS delay in main loop gives 2 available errors in 500 sends
2000ms delay in main loop gives 0 (zero) available errors in 500 sends

After longer tests even 2 second delays gets a small number of available errors. Most of the available errors are "2" or "6" from a 3 byte packet sent although I have seen "9" quite a few times as values that the v4 computes are available?

I also keep getting random disconnects which only seem to happen in the UART read mode?.

Some further info - the PIC to v4 baud rate is 19200, so the 3 byte packet takes under 2 milliseconds to transmit.

Fortunately for me the head PIC network does most of the work (over 90%) and only sends processed data to the v4, so I can try to live with these huge time delays.

For me though, this is a major flaw in using the v4, but I am going to try some CRC algorithms so I do not lose data - this of course will slow comms down even more.

My expertise is not in wifi, so I cannot be sure if this is a problem, but the wifi channel was not busy when these test were carried out, and I do not have any wifi issues outside of EZ-Builder.

Tony

#6

I was wondering, what happens at lower baud rates? Usually when data corruption is the issue, the best solution is to go to a lower rate. Increasing time between readings to a second or more seems like going too far to get reliable communications. Baud rate shifting often happens automatically in the handshaking process between units that use serial communications. That and using better quality cabling between the units involved in the communication process. This is especially acute at lower voltages and equipment with inadequate shielding or setups with too much inherent capacitance between the transmission ends. Capacitance (or inductance) in the connecting wires can be a signal killer. Have you actually looked at the transmission signal itself to see what quality of pulses you are sending/receiving? The the better choice is shielded cable which often has a characteristic impedance (capacitance-inductance-resistance combination). Of course your interconnection wiring here is short, but it is not to be simply dismissed when dealing with higher baud rates. In these days of gigahertz device operation we often take high data rates for granted and overlook the potential problems, but they can still crop up, as may be the case here. Even at "only" 19200 baud.

Going with a lower baud rate would certainly get much more data reliably across the channel, even at very low baud rates, than waiting upwards of seconds between data transmissions. If the PIC is unable to go lower, I would suggest a different interface device or a custom unit that can go to lower rates.

#7

@WBS00001, I tried lower baud rates and it makes no difference - in fact 9600 seemed to get more errors which does not make any sense. I scoped out the transmission and it looks clean and sharp. The main problem is for the PIC to handshake with the v4 - I tried a hardware RTS (request to send) line from the PIC to the v4 to initiate a (synchronised) hand shake but because of the v4 port latency problems (see my other thread) the strobe pulse has to be over 500mS long before it is reliable accepted, DJ confirms this problem in the other thread.

What I am now doing is sending a dummy byte from the v4 and making the PIC wait for this before sending any data - its a crude hand shake and is clunky and not very efficient, but I have been running the test code for a couple of hours with no errors at all. This is OK as I can make the PIC sync with the v4, but a device like the B5T Omron sensor will not have this facility so some of the continuous data sent from it will get lost if it is directly connected to the v4 UART.

Tony

#8

Hey Tony,

Very interesting problem, one which I'm sure others will be interested in and might like to help with. When I examine the issue I break it down this:

1. Custom PIC board sends serial data out
2. Data is send to the physical EZb4 via the UART port
3. EZB4 sends data wirelessly to via Wi-Fi to network or PC
4. PC receives receive Wi-Fi or network traffic from EZb4
5. EZ-Builder software running on the PC receives the data
6. Scripts in EZ-Builder interprets and display data sent from Custom PIC.

In my mind, there are 5 points of failure for the communication drops. I have a few guesses as to what might cause it (we probably all do) but maybe some different troubleshooting and experimenting would narrow down the scope of the issue.

The biggest question might be is this a problems an issue with the serial protocol on the PIC side or the EZb side? My guess is you have a higher level of knowledge in this area than I do, but over the years working with serial device professional and for personal us I've witnessed a wide verity of issues related to latency, timing, baud and handshake protocol and while it seems like serial is serial, but that has not been my experience over the years.

I'm curious if you have the ability to do some other experimenting?

Experiment 1: If you were to take a popular known microcontroller like an arduino and connect it to the UART and run similar PING style tests to see if packets are also lost?

Experiment 2: Connect to the PIC serial to the PC and run a similar test and see of packets are also lost?

#9

Hi Justin, firstly thanks for your and @WBS00001 input on this, it is appreciated.

There is no trouble with the PIC comms side - I have been writing PIC code since 1995 so I am very proficient in PIC serial comms. In fact the ALTAIR head has 3 networked PICs all working in unison (using quite complex low/high priority interrupts) and it all works fine. The master_PIC connects to the v4 and this is where it all goes wrong - I have checked the master_PIC to v4 serial link on a custom PIC terminal that I use for testing and that is working as expected. I have never used an Arduino in my life, I just like coding PICs which I am now pretty efficient at!

The thing with the PING/Arduino style tests is that you will probably never notice the missing data/packets as most will get through - it is only that I am counting/analyzing every data transaction that I am seeing the drop outs and errors. Of course with something like the B5T sensor, missing packets can throw the whole thing out.

Using the dummy send byte method from the v4 (to form a crude hand shake) with the master_PIC is working - the test has been sending live data from the master PIC and head network for 4 hours now with no errors. This is not the best option, but at least the master_PIC now has a reliable comms connection with the v4.

Tony

#10

I am now continually getting "nagged" to post that my forum thread "Flaky serial link with v4" has been resolved when it has not. Some things are probably not resolvable and the fact that DJ and Jeramie have never come back with a solution maybe confirms that the serial link can be flakey?

I think its pretty poor to have to pretend that someone has resolved my help request just to stop (what in my opinion are) these unnecessary emails, this will be the last time I will post for assistance because of this.

EZ-Robot, can I suggest that you allow for the situation that a help request cannot be resolved, so it stops these continuously annoying emails being sent in these cases. Thanks

Tony