1746   Support of non-western languages in enumerations

Created: 18 Jan 2021

Status: Not Applicable

Part: Part 6 (2019; Edition 2.1)

Links:

Page: 128-129

Clause: 9

Paragraph: 9.5.6

Issue

The question arises when someone tries to make a private enumeration.
Main purpose of enumerations is to make human readable representation of integer values.
SCL files usually use UTF-8 coding and may use other codings thus allowing strings in all possible languages.
EnumVal theoretically can be in any language. However, it is artificially restricted to Basic Latin + Latin-1 character sets. Latin-1 contains umlauts etc. used in German, French, Spanish etc. So English, French, German, Spanish etc. can make enumerations in their native languages but other nationalities cannot.
In SCADA - IED communication only integers are transferred. I think it must be responsibility of SCADA to read an appropriate string from SCL and to make it visible to users. There is no need to restrict characters to Basic Latin + Latin-1.

Proposal

Remove artificial and unnecessary restrictions of EnumVal, i.e. remove
&ltxs:pattern value="[\\p{IsBasicLatin}\\p{IsLatin-1Supplement}]*"&gt

Result:
&ltxs:simpleType name="tEnumStringValue"&lt
&ltxs:restriction base="xs:normalizedString"&gt
&ltxs:maxLength value="127"/&gt
&lt/xs:restriction&gt
&lt/xs:simpleType&gt

Discussion Created Status
Moved to Not Applicable 03 Jun 21 Not Applicable
EnumStrings are not intended to be a translatable string. This is exactly the same purpose as DO/DA names where it is understandable by human but still machine processable. Usage of Latin-1 instead of Basic Latin is for SIUnit which contains some additional characters for specific units.

The ord value is the value transmitted over the communication.

When you are speaking about translation of datamodel, there is two cases.

First is the user translation to give a meaning suitable to the user in his context. This is true for Enumerations, but also for LNs, DOs and DAs. And this is an HMI issue, which will have the capability to use multi language, multi character set, other text orientation for the region which want to display the enumeration, and with description

Second translation case is to translate the standard description of the datamodel and this is already supported by the NSD and associated NSDoc to localize the standard description to your country.

As a conclusion, SCL is not intended to be translated, even if it is human readable, and translation have to be supported within suitable process/tool
22 Feb 21 Approval (N/A)
Based on the discussion entries, I propose to reject the tissue too.
Note that this is not a Part 6 definition: basic types are defined in 7-2.
If a change is to be made, this would be in 7-2.

Additionally, this is not an interoperability problem, but would be a new feature, and therefore cannot be addressed as a tissue.

11 Feb 21 Triage
I would reject the tissue because
1) it does not consider an existing interop issue
2) I agree to keep character set for enums as it is.
25 Jan 21 Triage
I would go even further and restrict enums to a very small subset of Latin-1 supplement (possibly even limit to those characters already in use such a micro and degeee and the 2 superscript characters). 22 Jan 21 Triage
Agree with Bruce and Fred, this is not for translation.

I believe the supplement is needed for SI Units.
22 Jan 21 Triage
I disagree that EnumStrings are like program code or keywords because:
1. Program code and keywords are always Basic Latin characters. I never saw Latin-1 characters in program code, keywords, identifiers etc. VisibleString255 doesn’t contain Latin-1 either. Adding Latin-1 means human readability. So why not to add all possible languages?
2. I think that key part of enumeration is integer value (“ord”). Text part is for humans.
22 Jan 21 Triage
These EnumStrings are more like program code or keywords of a programming language, which are also not localized for good reasons.
It was not even intended to translate these strings into languages where this would work with the Basic Latin + Latin-1 character sets. These strings are mainly for the implementers, not for the end users.
Strings that are foreseen to contain information for the end user, as the dU attributes of any CDC, are of type Unicode255.
It seems to me that the use of the Basic Latin + Latin-1 character sets for the Enum texts is already clearly defined in part 6, a further clarification might not be necessary.
22 Jan 21 Triage

 

Privacy | Contact | Disclaimer

Tissue DB v. 23.12.13.1