Suite to a discussion I recently had with someone that thinks upload protocol v1.2 was significantly less reliable than v1.1, I wanted to share some thoughts with you and I’m of course curious to hear about yours!
This post certainly isn’t a one liner but I hope you’ll enjoy reading it. And - spoiler - you’ll find something interesting to try yourself at the end, too!
Before I can get into the fun stuff, a tiny bit of technical background. The TFT file header contains CRC32 based checksums for different areas of the file. One of them covers the resources, which means basically everything except your code (bootloader, pictures, fonts, …). When using upload protocol v1.2, the device checks if the new checksum matches what’s currently installed and if so, sends a skip command, such that the stuff doesn’t need to be reuploaded.
One of the core arguments against v1.2 was that CRC32 is not reliable enough for industrial use cases. The risk of a collision (meaning, two different payloads have the same CRC32) would be too high - and actually happen during uploads with v1.2. As a consequence the upload would fail, causing serious troubles for the industrial customer.
As a “proof” he sent me a crappy picture of a Nextion screen showing “Update Failed!check Error!” (you gotta love Nextions messages…). A second piece of proof was that the two strings
buckeroo have the same (non-Nextion) CRC32 of
0x4DDB0C25. Thus collisions wouldn’t be as rare as the 1 in 2^32 (0,000000023%) I claimed.
I’m sure anyone who’s used Nextions serial upload more than a couple times knows the error cited above. It means the file checksum doesn’t match, a.k.a. a part of it got corrupted during the upload. Thus, it didn’t seem like a terribly strong argument for a CRC collision.
As for the second “proof” it just made me laugh. CRC is no cryptographic algorithm thus you can easily determine a string that has the same CRC as another one (tools like spoof do this f.ex.). It also ignores the fact that finding any couple of strings with the same hash is orders of magnitudes easier than finding a string that has a particular hash (and I say hash because it applies to cryptographic hash functions, too. See generalized birthday problem vs same birthday as you).
You probably know that you can address pages and components using their
objname but at the same time you can’t read that name at runtime. Now guess why… Exactly! The objects are identified by the CRC of their name! Probably because on a micro it’s easier to compare integers than to compare strings.
But wait… Didn’t someone try to convince me that that same CRC algo was wayyy to risky for a firmware upload? Yet every firmware easily relies on hundreds of them? Even worse: this time the birthday problem fully applies because all of the page names must have a different CRC and all of the components on the page must have a different one, too. Instead of just one update needing to be different from the exact one that’s currently installed. How high is the risk you may ask? Well, assuming 250 components on one page, the risk of two of them having the same CRC32 is 0.00072% (generalized birthay problem formula). That’s still small, yet 39000 times more probable than a collision with the resources of the currently installed TFT file.
As I said, producing random collisions is easy enough. Using the Nextion CRC algo, I generated the following two strings which have the same Nextion CRC (which is a slightly modified standard CRC32):
z5lkez2qtifrgh. There are for sure prettier examples but these do the job, too. Now try compiling a HMI file with two pages using these two strings as page names… Surprise, the guys that designed the editor knew about the risk and actually catched that case! Well done; I certainly didn’t expect that.
Question for you now: how often have you seen that error in your Nextion career? If the other’s argument is to be believed, it should be a well known issue.
Finally, here’s another interesting bit: CRC collisions propagate. What does this mean? If the resources of two files have colliding CRCs while the rest of the file is the same, the file checksum is the same, too. This is because
CRC(a, b, c) = CRC(a, b', c) if
CRC(b) = CRC(b') and
len(b) = len(b').
That also means that a collision of the resource’s CRC doesn’t necessarily cause a data error as my discussion partner showed to me (btw he refused to provide any further details about that case). For fun I generated two HMI files that look completely different (one dark, one light). Compile them for your Basic/T or Enhanced/K series display (doesn’t work for Intelligent/X series displays), and upload one, then the other using the Nextion Editor (which uses the “unreliable” upload protocol v1.2). Of course I also included a set of files that can be used with the TJC editor
I’m sure you won’t anticipate the result!
Haven’t had enough details? Okay, here’re some more… You may have thought that it is highly unlikely to have two resource sections with the exact same length. However, they’re filled up to the next multiple of
0x10000 = 65536. Which means that on average you need to change your resources by 32kB before the section’s size actually changes. More likely is that you add or remove a picture, font, … Because the number of them is stored in the file header. Thus, even if the resource section would have the same CRC, the file CRC wouldn’t match because the file header changed. That would be an explanation of the data error screenshot I got as “proof”. How likely is it that this is what actually happened? You tell me… I said, unlike normal upload errors, a CRC error would be 100% reproducible. Anyone who has the two TFT files must be able to reproduce it. Otherwise it’s not a CRC error. I think the reply I got was somewhat like whether I wanted to question the competence of the person who faced the presumed CRC collision.
Let me know what you think!