Question about AB ControlLogix 5000 unusual issue

cipitio

Member
Join Date
Aug 2009
Location
Virginia
Posts
9
Hi:

We have 2 ControlLogix 5000 systems working on different parts of a large conveyor system. They are interconnected via reduntant CNBR cards. They share some info, but work by themselves for the most part. Both systems are redundant, using SRM Modules each of them. The processor is in a separate rack from the Ethernet module in both systems. Processor and rack with Ethernet module are connected via CNBR cards.

A few weeks ago, one of those systems had a problem where one module -'apparently' the Ethernet module- that talks with the sort computer was not working, so the system could not sort parcels; a Cisco switch is used for comms between the PLC and Sort PC (I say 'apparently' because ultimately the problem was something else). This issue took the system out of operation. Fortunately we can divert everything from one system to the other for this kind of emergencies, and that's what we did.

I went ahead and troubleshot the system (let's call it Sys1) doing what was a sensible technique in my opinion: First try a different Ethernet cable, then try using a different port in the switch, and finally, try a different Ethernet module. Nothing fixed this issue. I did my troubleshooting while both systems were down for the day, in the wee hours.

What I did that night was to test a different Ethernet module, i.e., I would take the one I thought was bad, put a new one, and see if Ethernet comms happened. In the end I did not even get that far, or leave the new module installed because I had to configure the IP Address on that Ethernet module and I was not prepared to do it that night, so I simply replaced the original Ethernet module and left...

The next day the other system (the good one, Sys2), would not work. We hit a brickwall because the Ethernet module was not working, so we could not talk to the system, and I did not have a PCC card to use one of the available ControlNet ports.

AB was called in, and what the tech found was a CNBR card that had to be reseated in Sys2 (nothing to do with Sys1, where the suspect Ethernet module is installed). That card provided comms for the backplane of the racks.

According to the tech, the Ethernet module in Sys1 was fine, but comms could not get any farther than the rack backplane because after that, the CNBR was frozen, and did not allow comms in the ControlNet network that links the processors among themselves; after reseating it in the rack, things started working, including the Ethernet module. I thought it was a sensible explanation, and quite frankly was surprised the problem turned out to be a CNBR card in the other system, and not the Ethernet module in the system with the symptoms...

Now my question (finally! Sorry for the long preamble): The boss (not a controls guy) is insisting that the Ethernet module I tried out the previous night broke the ControlNet module that had to be reseated. I tell him that is not possible. They are in different systems and they talk different languages (CIP vs TCP/IP).

Even more, he says that somehow firmware in the ControlNet module on Sys2 was corrupted by the Ethernet module I tested out on Sys1 the previous night. I have explained that 'flashing' firmware into a module is an elaborate process, and that it is impossible that the Ethernet module could flash anything by itself. As I mentioned, I could not even configure the IP address that night, so that module did not talk to anything whatsoever.

Even worse, another tech is telling him that the Ethernet module had to be 'flashed' before being installed (?). I said that 'flashing' the module with updated firmware is not a given, it may be needed if the current firmware is not compatible with the rack or processor, but in this case it was fine. The only thing that needs to be done to get that card working is configure the right IP settings (IP Address, Network Mask, etc.), and that is it.

I would appreciate opinions from you guys. This is a weird problem, and would appreciate any thoughts, or similar situations if you ever had them.

Thanks!

Cipitio.
 
I cannot comment on the plc problem, only the human one you're involved in.

I agree with you. There are many steps to flashing a card. In fact, some cards even Ethernet ones, are not flashable.

And many cards are usable out of the box, even if they can be flashed to a higher level.

If the Ethernet module 'broke' the ControlNet module, how did reseating it 'fix' it.


It really jerks my chain when someone (PHB) who admits no working knowledge comes out with hypothesis that are so far off base.
 
Hi Dave2:

Thanks for your response. My main problem is that since I worked on the system before it went down, they are saying that because I tried out the Ethernet card the night before.

I don't know how reseating the ControlNet card fixed the problem. My best technical opinion is that it was not making good contact due to any number of reasons.

However, immediately after the AB guy reseated it, boom! The system started working.

I am afraid that finding the technical reason why the system went down, and how it came back up don't really matter now, just finding someone to put the blame on, and since I touched the system last... I am the lucky one! Even though there are no technical reasons that make sense.

'Funny' thing is that the fault was on the system that had been apparently working fine. The one I was troubleshooting turned out to be an innocent bystander.
 
Hi Oakley:

The modules we have in the SRM racks are: 1756-CNBR / D D05.31.

Out in the field we have about 15 racks with different Revs.: D05_38_40, D05.27, D05.28.

Regards,

Jorge M.
 
Assuming at some stage the power was cycled on the rack or module concerned.
Then reseating the module broke the high resistive joint between the back plane and the module.

Now you must have been the lowest person in the company food chain that tried to fix the problem so that means its your turn to get eaten .

Be angry , be glad its fixed , then let it roll off your back, as everyone else who touched it are protecting there tails and dont care how you feel .
 
Hi Gil47:

As a matter of fact, power was recycled several times to no avail before calling in the AB guy.

That is the practice that some guy who advises the boss told us to do: Cycle power to the PLC's until the system comes back up.

I disagree with that method because I prefer to do it step by step to try to get to the root cause and learn something new, so we can really fix it, rather than cycle power until we get lucky and get it to work not knowing what did the trick.

Thanks a lot for your feedback. I'll look into it.

Cipitio.
 
You may have run up against the 497 day bug.

See RA note : 53307. This was sent out as a Service Advisory.

You might also review the redundancy bundle of firmware. Only specific firmware revisions have been tested and released for redundancy (note I am not speaking of ControlNet redundancy here).
 
Last edited:
Hi Oakley:

Thanks for your suggestion. This is something definitely worth looking into. The firmware in all of our CNBR modules is ancient. This may definitely be a step in the right direction.

Cipitio.
 
I am afraid that finding the technical reason why the system went down, and how it came back up don't really matter now, just finding someone to put the blame on, and since I touched the system last... I am the lucky one! Even though there are no technical reasons that make sense.

In the auto manufacturing environment, this is very true. Last person that touches it owns it. If I work on a system during downtime, my customers (GM, Ford, Chrysler,and their suppliers) expect me to be there for the morning startup to insure its making parts. Been like this for the 25 years I've been in the business. Sorry to hear you got stung.
 
Hi Oakley:

Thanks for your suggestion. This is something definitely worth looking into. The firmware in all of our CNBR modules is ancient. This may definitely be a step in the right direction.

Cipitio.


all of the module firmware that you listed is identified as having the issue. RA's fix is to upgrade the firmware.

Did I understand correctly that you have two redundant sets of controllers on the same CNet network? If so, you may want to reconsider this ... I asked the question some time ago about combining two redundant controllers and they responded that it is not a good idea ... if one determines failure and switches, the other most likely will switch for no reason.
 
In the auto manufacturing environment, this is very true. Last person that touches it owns it. If I work on a system during downtime, my customers (GM, Ford, Chrysler,and their suppliers) expect me to be there for the morning startup to insure its making parts. Been like this for the 25 years I've been in the business. Sorry to hear you got stung.

Live and learn. This has taught me a lot.

Actually the system that I went to work on was broken already, and it was broken when I left (I was not happy about it, but I knew I would continue working on it early the next morning).

What caused this big brouhaha is that the other system that was up and doing the job of both systems misteriously went down that morning.

In reality, I think it was probably going to go down anyway, but since I had been there, I owned that baby...

Thanks for your feedback!
 
all of the module firmware that you listed is identified as having the issue. RA's fix is to upgrade the firmware.

Did I understand correctly that you have two redundant sets of controllers on the same CNet network? If so, you may want to reconsider this ... I asked the question some time ago about combining two redundant controllers and they responded that it is not a good idea ... if one determines failure and switches, the other most likely will switch for no reason.

Actually, the rack with the processor in it has 3 CNBR modules, not 2, + the SRM module. Pretty interesting setup. One of the CNBR's connects the PLC's (Sys1 & Sys2). They exchange some info, but not much. The other one is dedicated to talk to a rack with HSC's coming from encoders, and the last one is dedicated to talk to the rest of the racks out in the field with all the discrete devices.

I think you are correct, but undoing that setup would be a pain. That's the way they designed it. The processor needs to receive/transmit some data from each CNBR. But thanks for the heads up.
 
what are the revisions of the processors?

I'd imagine that you have mixed firmware revisions that are not supported by the redundant system somewhere.

But by the what you have been describing, it really is sounding more and more like you have the issue ... especially since you have identified the firmware revisions of the CNBR's that are suspect.

I had this happen once ... just seemed to stop working. What happened was the connections in the CNBR maxed out and would't allow any further communications ... acted like it lost the IO. Another engineer cycled the power with no resolution ... I actually pulled the modules and reseated them (was going to replace them, but couldn't readily find the spares). Long story short, this reset the counter that trips up the CNBR. I rescheduled the network, and all took off again.

So ... when the RA engineer pulled the card and reseated it, actually it wasn't the connection, but rather no available connections.
 
Hi:

I am not at office today, but I think the firmware revs are the same for the CNBR's in the reduntant systems. Elsewhere in the system the firmware for the CNBR's is a mixed bag.

What you mention about reseating the CNBR is consistent with what the AB guy did. I am you had the same thing happened in your system. I have tried to explain this to the boss and other techs, and they don't understand why cycling power did not fix the problem, but reseating the module did. I didn't get it either, but I saw when he did it, nobody told me, so I know for a fact that is what fixed the problem. Now it makes more sense.

Also, now that you mention it, all the racks downstream from that CNBR appeared 'offline'. After reseating it, they all came back to life. Obviously they were up, it was that specific module that was 'frozen'.

Thanks!

Cipitio.
 

Similar Topics

'Morning All, I've run into something that is just annoying me on a CLX System using counter modules. The counter modules have 24 bit counters...
Replies
5
Views
12,355
A curious thing: If a Indradrive servo drive has firmware 20.xx in it, it can communicate with the ControlLogix using a generic (implicit)...
Replies
3
Views
1,669
Is there any way to stop a search from finding a text string inside a sub-tag name. For example: if I'm searching for "run" ... I only want to...
Replies
10
Views
2,091
Hi everyone, Suppose I have a periodic task to trigger some communications, and in this task I have a pointer that iterates for each...
Replies
3
Views
1,271
Hi all, I'm working on a wastewater plant where I have a ControlLogix PLC as the master PLC, alongside three packaged systems that are being...
Replies
10
Views
3,753
Back
Top Bottom