SLC5/05 MSG errors - can't find cause

ryangriggs

Lifetime Supporting Member
Join Date
Jun 2016
Location
USA
Posts
198
Hello, I have a SLC5/05 that acts as an aggregator of data from other PLCs, so the SCADA only needs to connect to the one PLC. Some of the other PLCs dump data into this one via MSG instructions, and this PLC uses MSG instructions to read read some other PLCs' data.


Everything had been working fine for many weeks. However, around 1am today the PLC stopped being able to read a remote PLC over the internet. It keeps throwing errors when attempting to read and write to the remote PLC via MultiHop. (config screenshot below)


The Read MSG error is: d8 - Connection was broken.

  • The read is attempted every 10 seconds.
  • I have the timeout set to 23 seconds.
  • The error happens after about 1 second from enabling the MSG instruction.
  • I have verified that the remote firewall allows access from this PLC's public IP address. Nothing changed with the firewalls.
  • I can read the same remote PLC from my office using another SLC5/05 unit with no errors, after adding my office public IP address to the remote firewall's Allow list.
  • The remote firewall shows 0 packets coming from this PLC.
  • I can connect to the remote PLC from within the problem PLC's local network, using tools like Ethernet/IP Explorer, and it can read the PLC's serial number and memory.
  • Other PLCs are still talking to the problem PLC with no problem, and the PLC is also querying several others on the local network.
The only thing I can figure is that somehow the PLC's routing/ARP tables are messed up thus it can't route traffic to the internet. I confirmed that the default gateway is set correctly in the PLC's network configuration.


I have restarted the internet routers at both ends of the connection, in case a TCP connection was stuck open.



My only option at this point is to try power-cycling the PLC. However, this requires an onsite visit, which would be nice to avoid if possible.


Does anyone have any suggestions I could try in this case? Thanks for any ideas.



5ag7


5agb



5ag9

5aga
 
Do you have a Diagnostic file assigned to Ch1? If so, you can look at ethernet statistics (Channel Status) under Channel Configuration while online and that might yield useful info.

I highly recommend assigning a diagnostic file to all SLC ports, especially when there are more than the simplest of machine to machine communications going on.

If there isn't a diagnostic file assigned, you have to do it offline, then download.
 
Thanks @Okie, here's a screenshot of the diagnostic file for Channel 1 (Ethernet). Does anything stand out to you?
I am assuming the negative numbers should be treated as unsigned integers instead, and the two-byte values should be treated as unsigned longs?


Quick update: the communication simply "started working" again after I left the office. I never got the chance to power-cycle the PLC. It looks like this happened once previously today also. It's now up and running.





5agk
 
If you double click on the Channel Status icon under Channel Configuration, and then click on Channel 1 tab, RSLogix 500 will show that data in a more readable structured format for you.

slc channel stats.png slc channel stats a.png
 
Last edited:
OK, so this is a fairly recently configured link that relies on an Internet connection.

The error code 0xd8 about a second after the MSG is triggered really does indicate just an ordinary TCP/IP connection failure.

Your description of the arrangement suggests that security consists of public Internet-facing IP address white-listing and port forwarding.

That adds a pucker factor. Are there VPNs or other encryption or tunneling methods involved ? Can the remote firewall tell the difference between packets coming from the local PLC, versus a test PLC or a computer on the same local LAN ?

I agree that both the affected local and remote PLCs should be examined for their TCP connection capacity and utilization. SLC-5/05's changed the number of things they could talk to simultaneously over the course of their firmware history and with the memory size. I used to know it by heart but off the top of my head it's 16 with the L551, 32 with the L552, and 64 with the L553.

All the other factors sure are puzzling.

This sort of thing is nearly impossible to troubleshoot remotely except with "well, have you power cycled it" ? Going to site with an Ethernet tap and Wireshark would be my approach, to figure out how far traffic is getting from the local PLC to the gateways to the remote PLC.
 
@Okie, I'm an idiot, here are screenshots of the stats pages. Thank you for pointing that out.
KWDN8.png



KWDNI.png

KWDNT.png

KWDO5.png

KWDOF.png
 
@Ken thanks I will do that if the problem reoccurs.

Do you know of a way to reset the ethernet interface in the SLC5/05 without power cycling? Something that would force it to rebuild ARP tables, re-read configuration, etc?


The remote site's firewall filters based source IP address and destination TCP port. There are no VPNs/tunnels. Both ends have static public IP addresses.
The primary PLC sends TCP packets directly to the remote site's public-facing IP address on port 44818, which the router port-forwards to the PLC's private LAN address (if the source IP address is whitelisted).


After testing everything I could on both ends, I kept getting the feeling the TCP stack in the requesting PC was in some way either full (blocking) or the ARP table or routing table was corrupted since LAN MSG requests continued to work fine.


Thanks for your help!
 
Last edited:
Thank you for posting those details !

I genuinely don't know if making a configuration change and applying it to the Ethernet daughtercard of an SLC-5/05 will cause a "reset" that dumps the stack and buffers and cache.

I am not aware of a programmatic way (like in the S: status table) or using a directed-toward-itself MSG or EEM instruction to reset the Ethernet daughtercard of an SLC-5/05.

Those screenshots do suggest (but not prove) some physical-layer problems between the SLC-5/05 and the switch it uses to connect to the LAN.

On the General tab, the Alignment Errors and FCS Errors typically have a value of zero or a handful from each time a cable is unplugged and replugged. But we don't know how long this SLC has been running since the last reset, so those errors could be from months or years ago.

On the EtherNet/IP Replies tab, the Received with Error and Timed Out counters might have just been incrementing every time a MSG instruction fails, so I am less concerned about them. I would have to do some research before I could tell you if monitoring the MSG instruction /ER bits would be a useful comparison to those counters.

The simple diagnostic is to click the [Clear] button and monitor the incrementing of the counters over time. They should be available in the Diagnostic file for programmatic measurement as well as obviously looking at them in RSLogix 500.
 
Another comment: Since your comms are SLC-to-SLC, you have the option of using the old fashioned CSPv4 protocol (TCP Port 2222) or the modern EtherNet/IP protocol (TCP Port 44818).

ControlLogix and CompactLogix and PanelView Plus and MicroLogix 1100/1400 don't speak CSPv4, but SLC-5/05's and PLC-5E's do. So you have got an option of not selecting the "Multi-Hop" protocol if you wish. I cannot imagine it making a difference, but it's there.

I realize you have whitelisting going on, but using Internet-facing routers with port forwarding is the sort of thing that makes cyber-security folks twitchy. Your modems are definitely being probed, and even if that isn't the source of this problem, it could cause other problems. I admit I don't know precisely where in the data path that IP whitelisting does its filtering.
 
@Ken thank you for this info. I'll reset the counters and see what happens. I can definitely relate to the cybersecurity point of view.
However in this particular application, I feel IP filtering is sufficient security vs the added complexity and overhead of a VPN tunnel. The main attack vectors should be 1) spoofing the source IP (they'd have to know a whitelisted IP) to blindly inject packets into the remote PLC and 2) DoS attacks, since the firewall still has to process and reject packets to port 44818 based on whitelisted source IP.
 
You're describing a system that uses weak cybersecurity and runs erratically after only a couple of weeks, so I don't think we can confirm an Internet-related issue or rule one out. But let's set it aside.

Is the Internet router used by the inter-site PLC network also used for other purposes, i.e. is it the main site connection to the Internet ?

Have you heard about any other glitches or interruptions in the network, maybe that someone waved off as "but we cycled power and everything's alright now".

I'm just thinking about the possibility of inadvertent "ARP Poisoning" where maybe there was an IP conflict between the router's LAN address and nother device.

Do you have a switch connected to the SLC that can give you port mirroring or other diagnostic functionality ?
 

Similar Topics

I have recently been working on a project that involves a PLC5/40 system and a SLC5/04 networked together through DH+. I've setup a MSG...
Replies
2
Views
3,503
Hi All, I have programmed some MSG instruction in SLC5/05 64k CPU series D FW 13 brand new cpu. 2 of the MSG instructions are direct IP to other...
Replies
3
Views
980
So I have a micrologix 1500 on network through an ENI. I need to read an integer from it to my SLC5/05 so I can stop pumps when the remote tank is...
Replies
7
Views
2,034
I am updating a panel with a 5/03 and a NET/IP and want to change it to a 5/05 (which will leave me with a NET/IP). I figured it would be a...
Replies
3
Views
1,794
First off I am a newb. With that said here is my issue. I have been researching the use and examples of the MSG command and I am at a loss at this...
Replies
0
Views
1,396
Back
Top Bottom