Like the title. It seems not. As I found on the WinAPI headers there are comments about "just forgeting Unicode for now" and they are indeed only cover ANSI version of the APIs. It's a long time since these WinAPI headers are updated, but I don't think anything changed.

0

} CM3 support Unicode

Yes, CM3 support Unicode.

But...

P.S. For preparing "21bit UniCode cm3" there is need use boot1 --} make --} boot2 methodology

0

} I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

0

I have examples even...

P.S. I going to public they...

0

} I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Try utf-8, please

P.S. We have Issue(s) or D-ion(s) about this topic ( about utf-8 vs others UTF standards (?) )

0

Please look at Pull request https://github.com/modula3/cm3/pull/1087

0

} I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't support Unicode at all. It turns out my guess was wrong.

0

} I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't support Unicode at all. It turns out my guess was wrong.

I really don't undestand "what is problem?" Write binding than use...

0

On 9/22/22 08:46, jpgpng wrote:

    } I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't, too. It turns out my guess was wrong.

Yes, CM3 has support of Unicode.  Type CHAR is only 256 values, and Modula-3 specifies it is ISO-Latin-1.  Many years ago, Critical Mass added type WIDECHAR, which was 16-bit, the prevailing bigger character size at the time, also adopted by Java.

A few years ago, I changed WIDECHAR to be Unicode code points, with ORD values of [0..16_10FFFF], with the compiler giving it 32 bits in memory. This means arrays of WIDECHAR are binary identical to UTF-32 in memory, and array subcripts are always one-to-one to code points.  You really need this if you want to do efficient random access by character number.  You can still put variable-length encodings in memory in arrays of whatever-sized elements, if you are masochistic.

There are various escape sequences for WIDECHAR literals.  There are octal (original M3) and hex escapes for both sizes.  So far, the compiler reads only ISO-Latin-1 in source files, so you have to use the escapes for higher-numbered code points in source files.  There has been talk of making the compiler accept alternatives, at least adding UTF-8, and maybe auto-detection, but AFAIK, no action.

TEXT values can be mixes of 16- and 32-bit code point representations internally, but this is all hidden by the abstractions in Text.i3, so you can ignore it if you don't get your fingers into the internal representations inside the inner Text* modules.  The escapes work in wide TEXT literals, but except for literal syntax, TEXT is just one type.

I made the size of WIDECHAR an option.  But in looking just now, it appears this has been disabled and WIDECHAR is always 16-bit. I will look into this more.  You do have to have everything, all linked-in code, compiled with the same WIDECHAR size.  This is checked at link time.  Otherwise chaos would ensue.

Then there is a lot of library code in m3-libs/libunicode.  There are readers and writers akin to Wr and Rd, but handling multiple encodings. There are also codecs converting among various encodings, both single code point at a time and whole readers/writers. Also with and without internal locking. Some understand all five Unicode end-of-line sequences.

I have layout filters for CHAR and Unicode layout that keep track for you of line numbers and character positions, but it looks like I have not put them in the cm3 distribution.  Let me know if anybody is interested.

There is also a unicode module somewhere from pre-full-WIDECHAR that provides different kinds of things.  As I recall, mostly constant names for many interesting specific code points.

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1085#issuecomment-1255048987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNCMWVNFLZH5SLHXVIDV7RPKLANCNFSM6AAAAAAQS3S34A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

0

Small remark:In fact compiler works with both cp866 ( "cyrilic Latin-1") and utf-8.Except CHAR and WIDECHAR constants in utf-8.( with cp866 this is good. As good TEXT constants with both tested coding)See my examples, please.( I am on smartphone.)22.09.2022, 18:52, "Rodney M. Bates" @.***>:

On 9/22/22 08:46, jpgpng wrote:

    } I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't, too. It turns out my guess was wrong.

Yes, CM3 has support of Unicode.  Type CHAR is only 256 values, and Modula-3 specifies it is ISO-Latin-1.  Many years ago, Critical Mass added type WIDECHAR, which was 16-bit, the prevailing bigger character size at the time, also adopted by Java.

A few years ago, I changed WIDECHAR to be Unicode code points, with ORD values of [0..16_10FFFF], with the compiler giving it 32 bits in memory. This means arrays of WIDECHAR are binary identical to UTF-32 in memory, and array subcripts are always one-to-one to code points.  You really need this if you want to do efficient random access by character number.  You can still put variable-length encodings in memory in arrays of whatever-sized elements, if you are masochistic.

There are various escape sequences for WIDECHAR literals.  There are octal (original M3) and hex escapes for both sizes.  So far, the compiler reads only ISO-Latin-1 in source files, so you have to use the escapes for higher-numbered code points in source files.  There has been talk of making the compiler accept alternatives, at least adding UTF-8, and maybe auto-detection, but AFAIK, no action.

TEXT values can be mixes of 16- and 32-bit code point representations internally, but this is all hidden by the abstractions in Text.i3, so you can ignore it if you don't get your fingers into the internal representations inside the inner Text* modules.  The escapes work in wide TEXT literals, but except for literal syntax, TEXT is just one type.

I made the size of WIDECHAR an option.  But in looking just now, it appears this has been disabled and WIDECHAR is always 16-bit. I will look into this more.  You do have to have everything, all linked-in code, compiled with the same WIDECHAR size.  This is checked at link time.  Otherwise chaos would ensue.

Then there is a lot of library code in m3-libs/libunicode.  There are readers and writers akin to Wr and Rd, but handling multiple encodings. There are also codecs converting among various encodings, both single code point at a time and whole readers/writers. Also with and without internal locking. Some understand all five Unicode end-of-line sequences.

I have layout filters for CHAR and Unicode layout that keep track for you of line numbers and character positions, but it looks like I have not put them in the cm3 distribution.  Let me know if anybody is interested.

There is also a unicode module somewhere from pre-full-WIDECHAR that provides different kinds of things.  As I recall, mostly constant names for many interesting specific code points.

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1085#issuecomment-1255048987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNCMWVNFLZH5SLHXVIDV7RPKLANCNFSM6AAAAAAQS3S34A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

0

P.S. Small fix:} TEXT values can be mixes of 16- and 32-bit code point representations No: we have two variants.First: 8bit and  16bit codepoints.Second: 16bit and "21bit in 4bytes" codepoints.First and second can switch only by total recompilation all "cm3 ecosystem"( ToDo: fixing in Github editor... )22.09.2022, 20:11, @." @.>:Small remark:In fact compiler works with both cp866 ( "cyrilic Latin-1") and utf-8.Except CHAR and WIDECHAR constants in utf-8.( with cp866 this is good. As good TEXT constants with both tested coding)See my examples, please.( I am on smartphone.)22.09.2022, 18:52, "Rodney M. Bates" @.***>:

On 9/22/22 08:46, jpgpng wrote:

    } I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't, too. It turns out my guess was wrong.

Yes, CM3 has support of Unicode.  Type CHAR is only 256 values, and Modula-3 specifies it is ISO-Latin-1.  Many years ago, Critical Mass added type WIDECHAR, which was 16-bit, the prevailing bigger character size at the time, also adopted by Java.

A few years ago, I changed WIDECHAR to be Unicode code points, with ORD values of [0..16_10FFFF], with the compiler giving it 32 bits in memory. This means arrays of WIDECHAR are binary identical to UTF-32 in memory, and array subcripts are always one-to-one to code points.  You really need this if you want to do efficient random access by character number.  You can still put variable-length encodings in memory in arrays of whatever-sized elements, if you are masochistic.

There are various escape sequences for WIDECHAR literals.  There are octal (original M3) and hex escapes for both sizes.  So far, the compiler reads only ISO-Latin-1 in source files, so you have to use the escapes for higher-numbered code points in source files.  There has been talk of making the compiler accept alternatives, at least adding UTF-8, and maybe auto-detection, but AFAIK, no action.

TEXT values can be mixes of 16- and 32-bit code point representations internally, but this is all hidden by the abstractions in Text.i3, so you can ignore it if you don't get your fingers into the internal representations inside the inner Text* modules.  The escapes work in wide TEXT literals, but except for literal syntax, TEXT is just one type.

I made the size of WIDECHAR an option.  But in looking just now, it appears this has been disabled and WIDECHAR is always 16-bit. I will look into this more.  You do have to have everything, all linked-in code, compiled with the same WIDECHAR size.  This is checked at link time.  Otherwise chaos would ensue.

Then there is a lot of library code in m3-libs/libunicode.  There are readers and writers akin to Wr and Rd, but handling multiple encodings. There are also codecs converting among various encodings, both single code point at a time and whole readers/writers. Also with and without internal locking. Some understand all five Unicode end-of-line sequences.

I have layout filters for CHAR and Unicode layout that keep track for you of line numbers and character positions, but it looks like I have not put them in the cm3 distribution.  Let me know if anybody is interested.

There is also a unicode module somewhere from pre-full-WIDECHAR that provides different kinds of things.  As I recall, mostly constant names for many interesting specific code points.

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1085#issuecomment-1255048987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNCMWVNFLZH5SLHXVIDV7RPKLANCNFSM6AAAAAAQS3S34A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

0

Sorry : 8 + 16 or 8 + 3222.09.2022, 20:17, @." @.>:P.S. Small fix:} TEXT values can be mixes of 16- and 32-bit code point representations No: we have two variants.First: 8bit and  16bit codepoints.Second: 16bit and "21bit in 4bytes" codepoints.First and second can switch only by total recompilation all "cm3 ecosystem"( ToDo: fixing in Github editor... )22.09.2022, 20:11, @." @.>:Small remark:In fact compiler works with both cp866 ( "cyrilic Latin-1") and utf-8.Except CHAR and WIDECHAR constants in utf-8.( with cp866 this is good. As good TEXT constants with both tested coding)See my examples, please.( I am on smartphone.)22.09.2022, 18:52, "Rodney M. Bates" @.***>:

On 9/22/22 08:46, jpgpng wrote:

    } I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't, too. It turns out my guess was wrong.

Yes, CM3 has support of Unicode.  Type CHAR is only 256 values, and Modula-3 specifies it is ISO-Latin-1.  Many years ago, Critical Mass added type WIDECHAR, which was 16-bit, the prevailing bigger character size at the time, also adopted by Java.

A few years ago, I changed WIDECHAR to be Unicode code points, with ORD values of [0..16_10FFFF], with the compiler giving it 32 bits in memory. This means arrays of WIDECHAR are binary identical to UTF-32 in memory, and array subcripts are always one-to-one to code points.  You really need this if you want to do efficient random access by character number.  You can still put variable-length encodings in memory in arrays of whatever-sized elements, if you are masochistic.

There are various escape sequences for WIDECHAR literals.  There are octal (original M3) and hex escapes for both sizes.  So far, the compiler reads only ISO-Latin-1 in source files, so you have to use the escapes for higher-numbered code points in source files.  There has been talk of making the compiler accept alternatives, at least adding UTF-8, and maybe auto-detection, but AFAIK, no action.

TEXT values can be mixes of 16- and 32-bit code point representations internally, but this is all hidden by the abstractions in Text.i3, so you can ignore it if you don't get your fingers into the internal representations inside the inner Text* modules.  The escapes work in wide TEXT literals, but except for literal syntax, TEXT is just one type.

I made the size of WIDECHAR an option.  But in looking just now, it appears this has been disabled and WIDECHAR is always 16-bit. I will look into this more.  You do have to have everything, all linked-in code, compiled with the same WIDECHAR size.  This is checked at link time.  Otherwise chaos would ensue.

Then there is a lot of library code in m3-libs/libunicode.  There are readers and writers akin to Wr and Rd, but handling multiple encodings. There are also codecs converting among various encodings, both single code point at a time and whole readers/writers. Also with and without internal locking. Some understand all five Unicode end-of-line sequences.

I have layout filters for CHAR and Unicode layout that keep track for you of line numbers and character positions, but it looks like I have not put them in the cm3 distribution.  Let me know if anybody is interested.

There is also a unicode module somewhere from pre-full-WIDECHAR that provides different kinds of things.  As I recall, mostly constant names for many interesting specific code points.

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1085#issuecomment-1255048987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNCMWVNFLZH5SLHXVIDV7RPKLANCNFSM6AAAAAAQS3S34A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

0

} (...) been disabled and WIDECHAR is always 16-bit.This mode can be switch to {{ WIDECHAR is always "21-bit in 4-bytes"}}22.09.2022, 20:23, @." @.>:Sorry : 8 + 16 or 8 + 3222.09.2022, 20:17, @." @.>:P.S. Small fix:} TEXT values can be mixes of 16- and 32-bit code point representations No: we have two variants.First: 8bit and  16bit codepoints.Second: 16bit and "21bit in 4bytes" codepoints.First and second can switch only by total recompilation all "cm3 ecosystem"( ToDo: fixing in Github editor... )22.09.2022, 20:11, @." @.>:Small remark:In fact compiler works with both cp866 ( "cyrilic Latin-1") and utf-8.Except CHAR and WIDECHAR constants in utf-8.( with cp866 this is good. As good TEXT constants with both tested coding)See my examples, please.( I am on smartphone.)22.09.2022, 18:52, "Rodney M. Bates" @.***>:

On 9/22/22 08:46, jpgpng wrote:

    } I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't, too. It turns out my guess was wrong.

Yes, CM3 has support of Unicode.  Type CHAR is only 256 values, and Modula-3 specifies it is ISO-Latin-1.  Many years ago, Critical Mass added type WIDECHAR, which was 16-bit, the prevailing bigger character size at the time, also adopted by Java.

A few years ago, I changed WIDECHAR to be Unicode code points, with ORD values of [0..16_10FFFF], with the compiler giving it 32 bits in memory. This means arrays of WIDECHAR are binary identical to UTF-32 in memory, and array subcripts are always one-to-one to code points.  You really need this if you want to do efficient random access by character number.  You can still put variable-length encodings in memory in arrays of whatever-sized elements, if you are masochistic.

There are various escape sequences for WIDECHAR literals.  There are octal (original M3) and hex escapes for both sizes.  So far, the compiler reads only ISO-Latin-1 in source files, so you have to use the escapes for higher-numbered code points in source files.  There has been talk of making the compiler accept alternatives, at least adding UTF-8, and maybe auto-detection, but AFAIK, no action.

TEXT values can be mixes of 16- and 32-bit code point representations internally, but this is all hidden by the abstractions in Text.i3, so you can ignore it if you don't get your fingers into the internal representations inside the inner Text* modules.  The escapes work in wide TEXT literals, but except for literal syntax, TEXT is just one type.

I made the size of WIDECHAR an option.  But in looking just now, it appears this has been disabled and WIDECHAR is always 16-bit. I will look into this more.  You do have to have everything, all linked-in code, compiled with the same WIDECHAR size.  This is checked at link time.  Otherwise chaos would ensue.

Then there is a lot of library code in m3-libs/libunicode.  There are readers and writers akin to Wr and Rd, but handling multiple encodings. There are also codecs converting among various encodings, both single code point at a time and whole readers/writers. Also with and without internal locking. Some understand all five Unicode end-of-line sequences.

I have layout filters for CHAR and Unicode layout that keep track for you of line numbers and character positions, but it looks like I have not put them in the cm3 distribution.  Let me know if anybody is interested.

There is also a unicode module somewhere from pre-full-WIDECHAR that provides different kinds of things.  As I recall, mostly constant names for many interesting specific code points.

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1085#issuecomment-1255048987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNCMWVNFLZH5SLHXVIDV7RPKLANCNFSM6AAAAAAQS3S34A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

0

{{I have layout filters for CHAR and Unicode layout that keep track foryou of line numbers and character positions, but it looks like I havenot put them in the cm3 distribution.  Let me know if anybody isinterested.There is also a unicode module somewhere from pre-full-WIDECHAR thatprovides different kinds of things.  As I recall, mostly constantnames for many interesting specific code points.}}I've interested.   / Described source code is interesting.( ok.. Looks like I will recheck and fix ( if need) tomorrow)22.09.2022, 18:52, "Rodney M. Bates" @.***>:

On 9/22/22 08:46, jpgpng wrote:

    } I found on the WinAPI headers there are comments about "just forgeting Unicode for now"

Looks like this about "Windows UNICODE API" (?)

No. Our WinAPI headers doesn't support Unicode version of the APIs so I guess CM3 in general doesn't, too. It turns out my guess was wrong.

Yes, CM3 has support of Unicode.  Type CHAR is only 256 values, and Modula-3 specifies it is ISO-Latin-1.  Many years ago, Critical Mass added type WIDECHAR, which was 16-bit, the prevailing bigger character size at the time, also adopted by Java.

A few years ago, I changed WIDECHAR to be Unicode code points, with ORD values of [0..16_10FFFF], with the compiler giving it 32 bits in memory. This means arrays of WIDECHAR are binary identical to UTF-32 in memory, and array subcripts are always one-to-one to code points.  You really need this if you want to do efficient random access by character number.  You can still put variable-length encodings in memory in arrays of whatever-sized elements, if you are masochistic.

There are various escape sequences for WIDECHAR literals.  There are octal (original M3) and hex escapes for both sizes.  So far, the compiler reads only ISO-Latin-1 in source files, so you have to use the escapes for higher-numbered code points in source files.  There has been talk of making the compiler accept alternatives, at least adding UTF-8, and maybe auto-detection, but AFAIK, no action.

TEXT values can be mixes of 16- and 32-bit code point representations internally, but this is all hidden by the abstractions in Text.i3, so you can ignore it if you don't get your fingers into the internal representations inside the inner Text* modules.  The escapes work in wide TEXT literals, but except for literal syntax, TEXT is just one type.

I made the size of WIDECHAR an option.  But in looking just now, it appears this has been disabled and WIDECHAR is always 16-bit. I will look into this more.  You do have to have everything, all linked-in code, compiled with the same WIDECHAR size.  This is checked at link time.  Otherwise chaos would ensue.

Then there is a lot of library code in m3-libs/libunicode.  There are readers and writers akin to Wr and Rd, but handling multiple encodings. There are also codecs converting among various encodings, both single code point at a time and whole readers/writers. Also with and without internal locking. Some understand all five Unicode end-of-line sequences.

I have layout filters for CHAR and Unicode layout that keep track for you of line numbers and character positions, but it looks like I have not put them in the cm3 distribution.  Let me know if anybody is interested.

There is also a unicode module somewhere from pre-full-WIDECHAR that provides different kinds of things.  As I recall, mostly constant names for many interesting specific code points.

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1085#issuecomment-1255048987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNCMWVNFLZH5SLHXVIDV7RPKLANCNFSM6AAAAAAQS3S34A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

0
© 2022 pullanswer.com - All rights reserved.