Convert Nextion to Text

Max · August 1, 2020, 5:29am

Hello everyone,

It bothered me that you can’t see the changes to your Nextion UI in a git commit diff. Of course you can’t, because it’s a binary file. This also means, that although you might have an open source project, everyone needs to use the - proprietary - Nextion Editor to be able to read the source.
Edit: aaand you can‘t get an overview of all the code that‘s on a page. You can‘t search through it and you have to check every single component if you don‘t know where‘s the code you‘re looking for.

I didn’t like that so I wrote a Python script that converts a given HMI file to text files, one for each page. They contain the essential information about every component (type, local/global, …) and all the event code. Effectively it would be possible to understand how the UI works with these text files.

If this sounds interesting to you, have a look over here (Example included): https://github.com/MMMZZZZ/Nextion2Text

Kind regards,
Max

luma · August 2, 2020, 2:37pm

What a great tool, thanks @Max!

xwintk · August 3, 2020, 8:32pm

Wow! Got a shot, cool one must say

Max · August 4, 2020, 11:09pm

@luma @xwintk Thank you!

Unfortunately I just discovered a problem. Components - and even entire pages - are stored multiple times in the .HMI file. The current version (of course…) and sometimes one or even multiple previous versions even though you have no “undo” option in the editor. When I created the tool it looked like the last version in the file is the most recent one. But now I found out it’s not. Since a few people here have played with the Nextion files, could someone tell me how I can find out which version is the most recent one?

Thanks,
Max

DVEous · August 5, 2020, 1:18am

That explains why the HMI file sizes are all over the place from one version to another of the same HMI project.

This is a poor situation, especially when you have a HMI that is close to the memory limits.

I’m now thinking the best interim measure is to export the pages out of the current (and most likely bloated HMI) and import them into a fresh HMI project.

Are you reading this Patrick?
If you’re as clever as you think you are, fix the memory leaks.

Max · August 5, 2020, 8:53am

@DVEous remember, this is the uncompiled file. Since I never saw a wrong version in the Editor or on a Nextion device, the Editor seems to know pretty well which version it‘s supposed to use, and compiles only this version. Have a look at the sizes in the compiler output. They scale pretty well with the amount of components you place on each page and no pages are multiple times larger than others when they have similar complexity.

Also, I‘m not saying that this is dumb or so. While the Editor doesn‘t allow to go back to past versions after a restart this could be a useful feature they had foreseen when designing the HMI file.

Kind regards,
Max

fvanroie · August 5, 2020, 11:03am

HMI File Version 33 uses a 512Kb file header TWICE, so even an empty HMI project will start at 1Mb of pure 00’s… it’s utterly rediculous.

fvanroie · August 5, 2020, 11:11am

There is a flag indicating whether the (duplicate) object is deleted or not:
An HMI starts with 4 bytes which is the object count. Followed by a struct for each object. This includes FileName, StartPosition, ObjectLength and Deleted flag.

Doing a Save As… also clears unused objects from the hmi file.

Max · August 6, 2020, 1:37am

With the great help of @fvanroie I got a big step closer to actually knowing what I’m doing. At this point I tend to agree a bit more with DVEous; there’s some weird stuff going on in these files.

The script is mostly* fixed, and works as expected. Here’s an example of a commit diff: https://github.com/MMMZZZZ/Nextion2Text/commit/f973ae2f13539c2c6a4b75de33a59943dda9ab27
Surprise, you can actually read and understand what changes were made to the .HMI file.

*That means that the actual components are still parsed in a rather dodgy way. If f.ex. you write a line that has more than 255 characters - even worse if it’s the last one of an event - something will go wrong.
Additionally, there’s still no support for some components and and for some component properties. Wouldn’t be as much fun if it was perfect, right?

Kind regards,
Max

Max · November 23, 2020, 8:30pm

I updated the script. It now supports all component types, including those for intelligent series displays. The CLI also prints the number of code lines (without comments) and the number of unique code lines. The latter one makes sense since Nextion sources somehow tend to have lots of duplicated code…

The script still doesn’t output any visual attributes like position, size, font, … This simply makes no sense for a text-only source and makes it at the same time much longer. The “important” attributes should be all there.

For those that are too lazy to scroll to the top, here’s the link to the repository https://github.com/MMMZZZZ/Nextion2Text

Kind regards,
Max

p.buehler · November 28, 2020, 7:29pm

Thank you @Max ! Your converter is exactly what I was looking for. Version control is the only way to reach quality goals in a software project, so I was quite surprized Nextion does not support this.
I’m using a basic Nextion module (NX3224T028_011)
I did some modifications:

Timer component: Event property name corrected (it’s different compared to Button components)
I added the most important graphics properies because I need to see all relevant changes in Git diff. (However, I did this only for text, button, number, hotspot, progress bar and slider.)
Sometimes numerical properties are converted the wrong way. I observed the maxval property of a slider:
I tried these values on Nextion:
0, 10, 100, 1000, 2200, 2201, 2202, 10
In the text file I get:
0, 10, 100, 1000, 2780, 10530, 2401, 10
… strange, somehow funny

Max · November 28, 2020, 7:57pm

@p.buehler: Thanks for the input! Glad you like it.

~~Seriously? I got the Timer event wrong? Surprises me because I was sure I tested that one…~~ Edit: well, looking at the source it is pretty obvious that it is wrong. Idiot. Anyways. Mind creating a pull-request?
Also, I do understand that the full list of attributes can be useful (a.k.a. pull request would be nice for this one, too). However, when adding it to the script I’d like to add an option to turn it on or off. Speaking of turning stuff on and off, this would be the right point to change to the much more flexible argparse library for CLI parsing.
The number stuff seems very odd to me. mind sharing a proof of concept file?
As I said previously some stuff is not parsed correctly. Now that someone actually uses this thing and makes enhancements to it, I might share some details about this. The overall structure parsing is pretty solid (i think at least) and written in “the right way”. However the parsing of the components themselves is actually pretty horrible. It was the very first thing I did - before having much understanding for the file structure - and I noticed that most component attributes were separated by [random byte][0x00][0x00][0x00]. Stupid me thought the triple 0x00 was something like the end-of-command thing in the Nextion serial protocol. Later I found out that the four bytes simply indicate the length of whatever follows. Hence: things that are bigger than 0xff cause problems.
While the issue is obvious, the fix would take some time - which I don’t have. When touching on this, I’d also like to remove the rather messy text generation and store all data as dict which can be automatically printed properly using json.dump/json.dumps. Soooo… if you’re willing to dig a bit deeper into my uncommented messy code, you’re more than welcome
For further and more technical discussions, GitHub Issues are probably a better place.

Kind regards,
Max

p.buehler · November 28, 2020, 8:33pm

@Max. Yes I’m willing to contribute, of course. It’s first time I’m using Python, so I don’t understand all of your code and I’m far away from being able to do stuctural improvements (CLI parsing ?). I would add an additional optional input argument somehow.
I just installed GitHub and do not know jet how to make a pull-request… first I have to clone the repository… I try to do my best
number stuff: The high byte of 2200 is 0x08. If this 0x08 is parsed the wrong way somehow.
best regards
p.buehler

Max · November 29, 2020, 11:26am

Great!

Well… this code is definetly no place to learn “best practices”. It was a very “learning by doing experience” for myself; actually the first time I wrote something that parses binary data.
CLI = Command Line Input/Command Line Interface. The current library I use to read what arguments the user specifies is very basic, however the previously mentioned argparse library offers pretty much all the flexibility you can get from a CLI.

There’re many good tutorials out there about GitHub and Git workflows It really isn’t that hard

Haven’t looked deeper into the number issue yet, and it still makes no sense to me, but I guess it’s related to the hacky parsing. We’ll see if the issue is still there once the parser’s been rewritten.

Kind regards,
Max

wspeth · December 2, 2020, 10:01am

Found Nextion2Text by google-fu. Something I have been longing for so much. I just registered to say THANK YOU MAX, really appreciate your work!

Max · December 2, 2020, 10:35pm

Good news!

@wspeth thank you for your kind reply! Glad it helped you!

Motivated by the input here I had a look at my parsing code and it was rather easy to fix. The really dodgy stuff is finally gone.

The most important: this (also) fixes the number bug! The script actually “misparsed” a fair amount of numbers which I hadn’t noticed until @p.buehler mentioned it.

Sorry for having such a severe bug unfixed for so long. If you find any other issues, feel free to tell me here or on GitHub.

Edit: Note that I also changed the command line input as promised above. Now includes description and help (-h). The readme of the repository got of course updated.

Kind regards,
Max

p.buehler · December 3, 2020, 4:26pm

Hi Max
Great, now I understand what you meant with “CLI parsing”. I will merge this to my Graphics_Properties branch.
Number Issue: I tested your new Master SHA-1: 0bc4582e910 with the example “Number_Bug.HMI” and didn’t get any difference in the text output. I still get the same wrong numbers. Your changes do not seem to have any effect on the output. Did you forget to push some changes?
Philipp

Max · December 3, 2020, 6:36pm

Interesting. To be honest I haven‘t checked your specific example since I haven’t got a dile from you? However, I did see that the new code changed a whole bunch of numbers in the example to actually meaningful values (see commit diff) so I assumed it worked as it should.

I can also confirm that I did not forget to push anything.

So I will have another look at it - preferrably with your testfile.

Kind regards,
Max

p.buehler · December 3, 2020, 8:23pm

Thanks for your effort. I’ll have a closer look at the code and try to understand. However I’m just leaving for holliday. We try to go skiing without getting ill
I’m not sure if I did the pull request the rigtht way, do you have acces to

and

Kind regards
Philipp

Max · December 5, 2020, 11:07pm

@p.buehler yes, I can indeed see them. However I didn’t get any notification about them. And ‘unfortunately’ they’re no more necessary.

Big news. So the last update was supposed to get the script working within its old limitations. Besides the fact that it still failed on some numbers according to @p.buehler, those limitations were pretty severe:

attributes must be added for every component, creating many duplicates.
attribute interpretation (replace dez = [0 or 1] by [‘horizontal’ or ‘vertical’]) was completely hardcoded and pretty much impossible to extend to all component type and attribute combinations.
No possibility to respect ‘invalid’ component attributes. If you change a page sta from picture background to solid color, the pic attribute disappears in the editor but it is still in the file, even though its value is meaningless. No realistic way to respect this in the old parser
probably many other limitations I forgot.

So I took the only possible decision: write the component parser from scratch (almost). The result is a parser that has virtually no limitations anymore.
However, the old dictionary that listed every component with its attributes is gone. There is now a dictionary with ‘only’ the atteibutes. This dictionary is still far from being complete - especially with regards to all the attributes for the Intelligent series. To make the script already usable, you can add command line flags to include/exclude properties that the script doesn’t understand yet.

Speaking of command line options: some news here, too!

Optionally generate a stats file that includes the code line counts from the command line output.
Optionally generate a json filefor each parsed page
Optionally specify a custom dictionary for parsing that extends or replaces the build-in ones.

Another thing that was missing until now: the Program.s code is now parsed, too.

For now this exciting new stuff lives in its own branch on GitHub: https://github.com/MMMZZZZ/Nextion2Text/tree/final-parser

Finally, for those interested in the more technical details, here’s how the new parser works (and how to write your own dictionary for it):

Instead of having a dictionary of all component types with all their associated attributes (component page: id, vscope, sta, …) , there’s now a dictionary that contains all the attributes 'directly’. This allows much easier parsing since the ‘type’ attribute has no special role anymore.
Parsing is done in multiple levels:
- A first low-level parsing extracts all the raw attributes and their data.
- Second step is to check if the parser knows the attribute structure (whether its a string or an integer). If so it converts the byte array accordingly, otherwise it remains a byte array.
- The third step is interpretation. This is where the new parser really shines. By now the values of all known attributes are already known, meaning they can be used to determine the meaning of other attributes. All those dependencies are included in the attributes dictionary (more about this below). Therefore the parser first resolves all the dependencies for this attribute and then, using the resulting dictionary, interpretes the attribute value.
The structure of the attributes dictionary is not too complicated but very powerful.
- At the top level we have an entry for every known attribute (f.ex. sta, val, …)
- Each attribute must have a "struct" entry that specifies whether it’s a string ("s") or an integer ("i").
- Any attribute can have any of the following optional entries:
  - "name": "description": a more descriptive expression than f.ex. vvs1
  - "mapping": dict: replace attribute values by those specified here. F.ex. the mapping for the dez property of a progress bar is "mapping: {0: "Horizontal", 1: "Vertical"}.
  - "vis": bool: whether it is a visual property or not.
  - "ignore": bool whether to ignore (exclude/skip) this attribute during interpretation. If true it will not appear in the json or text files.
- Now the fun part. Any of the optional attributes can be wrapped into a dependency - which can be arbitrarily nested! How do you specify a dependency? Simply add an entry with the attribute name it depends on! Dependencies are resolved from top level down. So the entries of an attribute are scanned for dependency entries. If there are any, they get resolved. This gets repeated until there are no more such entries. So if one dependency introduces new ones, they get resolved, too. It also means, that nested dependencies overwrite ‘parent’ dependencies. Which makes sense becausw the latter ones are forcibly more specific.
- There is one additional dependency that is not an attribute but works just like the other ones: "model": "T" / "K" / "P". Some attributes are only available on some models or have model-dependant meanings. Note that TJC X3/X5 series and Nextion Intelligent/P series are all included in "model": "P".

Sounds complicated? Simple example: A variable object has the following attributes: type, id, vscope, sta, val, txt, txt_maxl. Depending on the value of sta it‘s either a text variable or an integer variable. This means sta determines whether we want to parse val or txt and txt_maxl. Only sta? No! (Btw here’s a cookie for you if you’re still reading. I really appreciate your interest!) Of course this rule only applies if the component is actually a variable (type=52). Why? Take a number object: it also has a sta attribute but it controls the numbers background, not its value. So let’s look at the dictionary entry for the val attribute:

"val": {
    "struct": "i",
    "name": "Value",
    "type": { # Dependency on type
        52: { # In case type equals 52, use these entries
            "sta": { # Dependency on sta
                0: { # In case sta equals 0, use these entries
                    "ignore": False,
                },
                1: { # In case sta equals 1, use these entries
                    "ignore": True,
                },
            },
        },
    },
}

I hope this makes the basic concepts understandable. And if so, I’d really ap’reciate any contribution to the existing attributes dictionary!

Kind regards!
Max