How do I convert UTF-8 (unicode) to ASCII (Latin-1)?

Author

Message

Tyrone M

18

Years of Service

User Offline

Joined: 25th Jan 2008

Location: Minnesota, USA

Posted: 1st Nov 2014 22:33

Link

Help please.
I'm having problems with reading string characters. I believe that what I'm inputting is in Unicode and therefore causing me problems. The problem arises when I try to perform character by character comparisons on the input strings.

How can I convert this?

You can see the "input" text and the result of a simple read string from file and print here:

http://www.djfunk.com

Thank you

Back to top

Profile PM

Rudolpho

20

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 2nd Nov 2014 10:26

Link

Those characters aren't part of the ASCII standard, which only covers 128 unique characters. The remaining 127 can be mapped in several different ways in DBPro and are referred to as charsets which you can set using an optional second argument to the SET TEXT FONT function.
See the "Principles/ASCII character codes" section of the DBPro help files for a list of the available ones.

Back to top

Profile PM Email Website

Tyrone M

18

Years of Service

User Offline

Joined: 25th Jan 2008

Location: Minnesota, USA

Posted: 2nd Nov 2014 16:38

Link

Sorry Rudolpho. I knew someone would bring up SET TEXT commands.

My problem has nothing to do with displaying text on the screen. It deals with reading & writing strings to a file..

The TEXT I'm READING FROM a FILE is apparently in UNICODE. Therefore when I WRITE the TEXT back TO a FILE certain characters are garbled.

In the output I put on www.djfunk.com the string data in the file is exactly the same was what's displayed on the screen.

There must be a way to convert the characters most likely using some sort of bit functions. I just don't know how.

Any DBP bit-twiddlers out there? Anybody? Thanks!

Back to top

Profile PM

Rudolpho

20

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 2nd Nov 2014 20:49 Edited at: 2nd Nov 2014 21:53

Link

Ah, I see.
The problem then is that your desired strings are in unicode rather than extended ASCII / UTF-8. The main issue with this is that unicode strings use 2 bytes per character and DBPro's strings only use single-byte characters. Therefore storing such strings would have you implement the entire range of needed string operations yourself.
As for writing a unicode text file, most text reading applications will interpret a .txt file as being in unicode if it begins with the special byte-order mark word (0xfeff) and so could yours when you are reading text files.

Edit: In line with the above, a text file will be interpreted as UTF-8 by writing the byte sequence 0xef 0xbb 0xbf as a header to the file.

Back to top

Profile PM Email Website

mr_d

DBPro Tool Maker

19

Years of Service

User Offline

Joined: 26th Mar 2007

Location: Somewhere In Australia

Posted: 3rd Nov 2014 15:38 Edited at: 3rd Nov 2014 15:38

Link

Hi Tyrone M, A simple option if suitable (found on the web through Google) is to use the following command to pre-process your input file:

+ Code Snippet

cmd /a /c type input_unicode.txt>output_ansii.txt

This can easily be done using DBP's EXECUTE FILE command to generate a temporary intermediate file that you can use to read in and do your character comparisons.

Back to top

Profile PM Email

Tyrone M

18

Years of Service

User Offline

Joined: 25th Jan 2008

Location: Minnesota, USA

Posted: 3rd Nov 2014 18:30

Link

mr_d,
that is a solution that would work. The input file can be pre-processed in that manner.

Thank you
And thanks Rudolpho too.

Back to top

Profile PM

mr_d

DBPro Tool Maker

19

Years of Service

User Offline

Joined: 26th Mar 2007

Location: Somewhere In Australia

Posted: 4th Nov 2014 02:47

Link

that's good and you're welcome.

glad this solution works for you.

Back to top

Profile PM Email

Guido Italy

20

Years of Service

User Offline

Joined: 25th Dec 2005

Location:

Posted: 30th Jan 2015 21:23

Link

hi !

please ,

how I have to use this command?

cmd /a /c type input_unicode.txt>output_ansii.txt

??

thank

Back to top

Profile PM Email Website

Guido Italy

20

Years of Service

User Offline

Joined: 25th Dec 2005

Location:

Posted: 30th Jan 2015 21:26

Link

more precise:

i'm italian,
how can I read from a txt file,
(For example in German) the special characters (eg accents),
and display them correctly in a "edit control" of BlueGui?

Back to top

Profile PM Email Website

Tyrone M

18

Years of Service

User Offline

Joined: 25th Jan 2008

Location: Minnesota, USA

Posted: 30th Jan 2015 21:35

Link

Guido,

I would use Google translate to attempt this.

The suggestion to try:
cmd /a /c type your_input_unicode.txt > your_output_ansii.txt
was to convert unicode to ascii (english I would presume). You enter this from windows at the command prompt: windows-R (that windows key

You'll probably get better answers from others.

Back to top

Profile PM

Guido Italy

20

Years of Service

User Offline

Joined: 25th Dec 2005

Location:

Posted: 30th Jan 2015 22:27

Link

Thank Tyrone ,

my problem is the characters "non-Latin"
and BlueGui EditorGadget

Back to top

Profile PM Email Website

Sorry your browser is not supported!

DarkBASIC Professional Discussion / How do I convert UTF-8 (unicode) to ASCII (Latin-1)?