Robust I2C Error Handling

Mehdi Rahman

Hello all,

I am trying to ensure my I2C communication is as robust as possible. In particular, I am now looking at a case where a Master MSP430 communicates with multiple slave devices. I am considering the case when one or more slaves holds the SDA line low, causing the I2C line to become stuck.

Let's first assume there is only one slave device. If I am using a USCI_B interrupt-driven system, I believe I can do the following: I can set UCNACKIE. In the case that the SDA line is held low, UCNACKIFG will be set, and the code will enter the corresponding ISR. Here, if the MSP430 sends a stop sequence, this should free up the SDA line (is this true, or is there a more elegant solution to this). After this, I could restart another I2C transmission with the slave device since the previous transaction was corrupted.

Let's complicate things a bit further, and say that we are reading a series of registers from the slave. If this is the case, the I2C transmission will be the following:
1) Send address byte with write bit
2) Send register byte (indicating the register to start reading from on the slave device)
3) Send address byte with read bit
4) Keep reading bytes that come from slave until the Master MSP430 sends a NACK (or equivalently, when the MSP430 sends a stop)

Now the question comes, how much of a difference would it make if an error happened in one place versus another -- that is, for example if an error happened before the slave sends the data to the Master MSP430, versus during the transfer of this data? Would the fix be different? Am I correct to assume that this situation could happen, or is it very unlikely to happen in one case versus another.

Sincerely,

Mehdi

over 11 years ago

0 Jens-Michael Gross over 11 years ago

Guru 227245 points

Mehdi Rahman said:
I can set UCNACKIE. In the case that the SDA line is held low, UCNACKIFG will be set, and the code will enter the corresponding ISR.

Not quite true. UCNACKIFG is set when after sending a slave address ro a data byte, SDA is not pulled low by the slave during the 9th (ACK) clock cycle.

SDA being held low when the master wants to send a high is considered an arbitration loss.

SCL being held low is considered to be as clock stretching and is a valid condition (even though it is indicated by teh SCLLOW bit). However, if this happens erroneously or for unlimited time, there is nothing you can do.

In case of NACKIFG, the USCI master will autiomatically generate a stop condition (preceded by a dummy low bit, sicne a stop condition requires SDA going from low to high while SCL is high, and after a NACK, SDA is already high)

Getting a NACK isn't necessarily a corrupted transfer. It may be the expected behaviour if you sent the maximum amount of data allowed for an operation. Note that when you read from a slave, the slave cannot send you a NACK at all, as only the receiving side can ACK/NACK.

Mehdi Rahman said:
ow the question comes, how much of a difference would it make if an error happened in one place versus another

It highly depends on the high-level protocol and is slave specific.

A slave may not react to a start condition if it has nothing to send or is still booting. Then the master should simply retry later. Or a slave doesn't respond at all while it shuld. The master could try to reset the slave by a controlled power cycle or by pulling the slaves reset line (if this is possible by the hardware layout).. Or it must assume a defective/absent slave and cope with it this or that way.

If SCL is held low infinitely before start or during transfer, there's nothing the I2C master can do. However, if the hardware allows it, a power cycle of all slaves or resetting them with their reset lines (if available) might remove thsi error condition.

Same for permanently pulled-down SDA. SDA beign low when teh master expects it to be high is either a sign of a different I2C master talking, or a bus error, in which case the cure is the same as for permamnentyl low SCL.

0 Derek over 11 years ago

Mastermind 9721 points

If you are interested in very robust I2C I recommend that you also control power to all your I2C peripherals (via a FET or equivalent). That way if the I2C bus becomes locked up (usually because some peripheral gets confused and is holding it down) then you can 'reboot' the bus.

--Derek

0 Mehdi Rahman over 11 years ago in reply to Jens-Michael Gross

Intellectual 555 points

Jens-Michael Gross said:
SCL being held low is considered to be as clock stretching and is a valid condition (even though it is indicated by teh SCLLOW bit). However, if this happens erroneously or for unlimited time, there is nothing you can do.

Jens-Michael Gross said:
If SCL is held low infinitely before start or during transfer, there's nothing the I2C master can do.

I understand that it may vary, but what would cause these cases to happen?

0 Jens-Michael Gross over 11 years ago in reply to Mehdi Rahman

Guru 227245 points

Mehdi Rahman said:
I understand that it may vary, but what would cause these cases to happen?

missing or unsufficient pullups, a crashed slave, leakage currents, shortcut traces on the PCB, defective components (shorted blocking capacitors), dirt/flux remains...

0 Mehdi Rahman over 11 years ago in reply to Jens-Michael Gross

Intellectual 555 points

Jens-Michael Gross said:

missing or unsufficient pullups, a crashed slave, leakage currents, shortcut traces on the PCB, defective components (shorted blocking capacitors), dirt/flux remains...

So it sounds like most of this is hardware design related and can be avoided with careful precautions, except for the "crashed slave" case. It seems like when these events happen, which should be rare, the only real solution is to reset power to the slave device. Is this true?

0 Mehdi Rahman over 11 years ago in reply to Mehdi Rahman

Intellectual 555 points

Hello,

I believe it is best for me to rephrase my question. Let's assume I have a setup with a single master I2C device, and two slave devices. In this case, the master is an MSP430, with USCI hardware. The I2C bus will operate in fast mode -- i.e. clock is 400 kHz.

The master MSP430 communicates with one slave as a master-transmitter -- i.e. this slave device simply accepts commands and sends an ACK.

On the other hand, the master MSP430 communicates with the other slave device in combined format. That is, the master MSP430 sends the device address with a write bit, the register address, then the device address with a read bit, and then begins receiving a stream of bytes, until the master MSP430 sends a NACK, in this case by sending a stop condition.

The question comes, well what kind of errors can potentially come up on the bus and how can we take care of them? In the following, I reference the I2C Protocol document: http://www.nxp.com/documents/user_manual/UM10204.pdf

------------------------------------------------------------------------
Since there is only a single master here, I believe we can rule out clock synchronization and arbitration. I believe we can break them down into the following.

1) SCL held low:
If a slave device for some reason cannot respond in time, it can hold the clock line low to give it more time to respond.

Ideally when the slave it ready, it should release the clock line. However, in some cases it may get stuck. The MSP430 can check this by looking at the UCSCLLOW bit. If it stays low too long, then the only thing to do is reset power to the slave.

2) Slave does not send ACK:

According to the I2C protocol, this would happen if:

1. No receiver is present on the bus with the transmitted address so there is no device to
respond with an acknowledge.
2. The receiver is unable to receive or transmit because it is performing some real-time
function and is not ready to start communication with the master.
3. During the transfer, the receiver gets data or commands that it does not understand.
4. During the transfer, the receiver cannot receive any more data bytes.
5. A master-receiver must signal the end of the transfer to the slave transmitter.

Ideally 1 can be avoided by keeping track of correct slave addresses.

As long as the designer ensures the proper commands are sent to the slave, and that the transmission lines are clean and not very noisy, then 3 can be avoided as well.

In the application I described, with the master MSP430 acting in a combined format to receive data, 5 should not be an error as long as the master MSP430 correctly keeps track of the number of bytes to be received and sends a NACK at the proper time.

If case 2 happens, the master MSP430 could use the NACK interrupt to see that no ACK was sent by the receiver. The only thing the master MSP430 can do is re-transmit the data.

I'm not exactly sure what case 4 is, but I believe it is related to holding the SDA line low. Is this true? I will address that case below.

3) SDA line is held low:

I'm not exactly sure what will trigger this, but I believe it is due to the slave being busy and holding the line low. Is this correct?

I believe the intervention in this case would be to send 9 clock cycles through the SCL line. On the MSP430, this could simply be done by sending a dummy stop sequence. Or would it be better to temporarily convert SCL to GPIO and toggle the line until SDA is released?

Would there be a case such that the master would have to reset power to the slave, or will one of the above interventions suffice.
------------------------------------------------------------------------

Finally, I noticed that the I2C protocol document says:
If the power supply to a Fast-mode device is switched off, the SDA and SCL I/O pins must be floating so that they do not obstruct the bus lines.

What does this mean exactly, and how do we take this into account when the master MSP430 resets power to the slave devices?

I would appreciate any feedback.

Sincerely,

Mehdi

0 Mehdi Rahman over 11 years ago in reply to Mehdi Rahman

Intellectual 555 points

Also, is there a particular part number you can refer for the FET? Due to my design constraints, I would prefer the smallest package available.

0 Mehdi Rahman over 11 years ago in reply to Mehdi Rahman

Intellectual 555 points

Hello,

I did the following to detect/correct problems on the I2C bus. In order to detect unsuccessful I2C transmissions, I utilize a timer, an error flag, and a timeout flag. Before starting the I2C transmission, I set an error flag and initialize the timer. If I2C transmission goes as expected, then all flags get reset in the I2C ISR. However, if a certain amount of time passes and the flag is still set, then the code enters the timer routine, where the flag is reset, a stop byte is sent to the current I2C slave address, and a timeout counter is incremented. If the timeout counter reaches 5 (which means the code entered the timer ISR 5 times in a row without entering the I2C ISR), then a Reset_I2C_Flag is set, and the CPU wakes up. In the main routine, code executes to reset power to I2C devices, re-initialize the I2C bus, and re-initialize registers on the I2C slave devices.

Additionally, I also utilize the NACK interrupt. If the system sees 5 NACKs in a row, then Reset_I2C_Flag is set, which causes the same code (as mentioned above) in main() to be executed.

Overall this solution seems to be working, but I want to make sure my approach is correct and takes all possible errors into account.

Sincerely,

Mehdi

0 Jens-Michael Gross over 11 years ago in reply to Mehdi Rahman

Guru 227245 points

Mehdi Rahman said:
a stop byte is sent to the current I2C slave address,

There is no stop byte. A stop condition is simply a transition of SDA from low to high while SCL is high. It is a general bus event and not sent to any particular peer. It marks the bus as free and ends any current transfer. However, to send a stop condition, 1) SDA must be low and 2) SCL must not be held low by the slave.

Having a timeout on an I2C transfer is a good thing to detect whether something is wrong. However, if a transfer is stalled tdoe to a low clock, you cannot interrupt it (see above) and you cannto start a new transfer. A stalled bus is a stalled bus and until the condition is removed (by resetting the slaves through power cycle or reset signal, if supported, or simply by waiting until the slave is ready again or has an internal timeoput and resets) you cannot do anything on the bus.
Your code apparently does that so it should work. I just don't know why you are counting the timeoput counter up. Just use a five times as large delay (well, if possible due to timer tick speed) and stop/reset the timer in the I2C ISR. No need to call the timer ISR five times when one is enough.

In most cases, it is sufficient to flag a failure if you get one NACK form the slave (meaning the slave isn't there). But some slaves my NACK (ignore the bus) if busy and answer on second try. In this case, of course retries are useful.

The thing that's perhaps still missing is when a slave NACKs while the master still wants to send (this too gives a NACK interrupt, but this time this time it means the slave was there but don't want the data - probably after getting some. Whether you can simply retry depends on the slave. And you NACK handler must separate the two cases.
What you cannot check is when a slave ACKs initially, but then doesn't have more data for you. In this case, the master will pull dummy 0xff bytes from the bus while the slave is already gone to sleep. You can't check for this at all.

0 Darren Beckwith over 11 years ago in reply to Mehdi Rahman

Expert 1010 points

Mehdi,

Mehdi Rahman said:

3) SDA line is held low:

I'm not exactly sure what will trigger this, but I believe it is due to the slave being busy and holding the line low. Is this correct?

I believe the intervention in this case would be to send 9 clock cycles through the SCL line. On the MSP430, this could simply be done by sending a dummy stop sequence. Or would it be better to temporarily convert SCL to GPIO and toggle the line until SDA is released?

Would there be a case such that the master would have to reset power to the slave, or will one of the above interventions suffice.

In one of our products this happens a lot and it turned out to be a reset (watchdog or ESD pulse) occurring during an I2C read when the slave is in control of the bus. We found the general call reset (clocking out 9 clocks) to not be 100% reliable so we switched to controlling the power of the I2C devices. Our boot loader resets all I2C devices on power up.

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

Robust I2C Error Handling