Questions tagged [unicode]
Unicode is the standard for computer representation of plain text. It encompasses the Universal Character Set, intended to unambiguously represent all characters used in human writing systems in any language, Unicode Transformation Formats (UTFs), defining standardized formats for storing and transmitting Unicode text, and standards for processing and manipulating text.
716
questions
8
votes
5
answers
4k
views
What is the name of this character that looks like an upside down arrowhead?
It came to my attention that the term "caret" refers specifically to the inversely-oriented variant of this symbol, i.e., upward pointing.
Oddly enough, searching caret didn't readily reveal ...
2
votes
2
answers
94
views
Mixed UTF-16BE strings with ANSI in content stream of a PDF page object
I am working on generation of PDF on the fly tool, with custom chosen local font.
Embeeding the font is not a problem, and been proven that font been found w/o problem.
However, the font is Chinese, ...
0
votes
1
answer
204
views
UTF-8 Decoders fail to decode the encoded strings
I have some encoded values values which I believe is UTF-8. Now I dont really know if it is UTF-8 or not because other online tool and steps to decode UTF-8 is not working, BUT an open source tool ...
0
votes
1
answer
133
views
How can I type 🧞 on Windows?
I tried:
hold down Alt, press +, press the Unicode hex value 1F9DE, release Alt. This does not work because when I press the F key, it triggers the File menu (Alt+F hotkey) of the browser. In notepad ...
0
votes
1
answer
69
views
non-unicode fonts displaying as gibberish in Windows 10 guest running in qemu
I have a Hebrew app that uses non-unicode fonts, and when I run it in my Windows 10 guest, which is managed by virt-manager, it started displaying gibberish, although it used to work right. I tried ...
0
votes
2
answers
133
views
What is the official name for each of the unicode locale identifiers formats?
Unicode locale identifiers can be written in different formats en-us, en_US, and enUS, what is the official name for each of these formats?
After searching, I found that the format en-us is called ...
5
votes
2
answers
253
views
How do I exhibit the character Ň (U+0147) in my Word 2010? Why the method described in the first paragraph below doesn't work for this character?
As far as I can understand, to exhibit a Unicode code point in Word, one should press the Alt key, followed by the character + and the Unicode code point, both typed in the numeric keypad. That’s ...
1
vote
1
answer
141
views
How to make one key stroke emit 3 unicode character in a custom keyboard layout in Linux mint?
I have created a custom keyboard layout for my regional language by editing different files. It works fine except for one problem. One of the characters needs 3 unicode characters in combination I.e. ...
0
votes
1
answer
44
views
Interpret an Unicode value as real character in Notepad++
If I copy an Unicode encoded value as actual rendered character (for example form here - 1D400) and paste it into Notepad++, it really does show it as a "bold" character.
But when I try to ...
0
votes
0
answers
211
views
Git Bash doesn't display characters correctly, using Visual Studio Code's integrated terminal
I use Git Bash as the default terminal profile on Windows. When I execute this command, I get the below output:
memlab run --scenario tests/oversized-object/index.js
> 'command' �����ڲ����ⲿ���Ҳ���...
0
votes
0
answers
20
views
Unicode glyphs for wysiwyg ruler bar symbols (margin, left-tab,right-tab, center-tab and decimal-tab)
Various word processors and text layout software have a "ruler bar" to indicate margins, indentation, and tab locations for a block of text. The symbology used in ruler bars is fairly ...
1
vote
1
answer
144
views
Bash prompt with zero width control characters
To prevent the prompt from warping around right-to-left text, I want to insert a zero width LRM into the bash prompt (U200e). Since this is a zero width character, my instinct was to wrap it with \[&...
1
vote
1
answer
57
views
Correct way to encode mixed width text in Unicode?
From what I read, the fullwidth glyphs in Unicode are provided solely for backward compatibility and lossless roundtrip with legacy standards such as Shift-JIS. The rationale seems to be that Unicode ...
1
vote
1
answer
103
views
Unicode sequence for custom character with dot underneath
Unicode has code points for certain Roman characters with a dot underneath, for instance U+1E04 — "Latin Capital Letter B With Dot Below" but code points don't exist for all letters.
How can ...
2
votes
1
answer
139
views
VIM uses wrong encoding - but only in status messages
I ran into a strange issue with my ArchLinux setup. Vim uses correct encoding for reading/displaying files but these status messages (which displays the current mode or reports back when the buffer is ...
1
vote
2
answers
610
views
Excel treats unicode characters as text
I have received an excel file which contains data from government website, whose text is in Hindi language.
When I open the csv file, it displays the hindi texts in unicode characters like the ...
0
votes
1
answer
181
views
How to disable Ctrl+Shift+u keyboard shortcut (unicode-selector) in Fedora 39 KDE-Spin (Wayland)
0. How can I disable the Ctrl+Shift+u keyboard shortcut (unicode-selector) in Fedora 39 KDE-Spin (Wayland-Session)?
Whenever I hit this key combination on a GTK+-app, the infamous underlined "u&...
1
vote
2
answers
955
views
How to write correctly arbitrary fractions in Unicode?
How do i write correctly arbitrary fractions according to the Unicode standard for the OpenType feature frac ?
My use case is:
I write data in a text document(UTF-8) and have self written program ...
3
votes
1
answer
183
views
How do I copy the URL contained in the Internet shortcut with unicode characters? (command line for many Internet shorcuts)
The subject was in part studied, there is 8 years (here : How do I copy the URL contained in the Internet shortcut?)
I have thousand Internet shortcuts that I would like copy the URL contained to txt ...
0
votes
1
answer
54
views
Can unicode characters requiring the number pad, e.g. U+3041, be typed using the On-Screen Keyboard?
Suppose that I want to type 'ぁ' *Unicode U+3041) using Window 10's On-Screen Keyboard without changing my language away from US English. Can it be done?
As far as my research indicates, this would ...
1
vote
2
answers
695
views
Unicode characters became invisible since Windows 10 update
After a recent Windows 10 update, emoji became invisible in several apps (Google Chrome, MS Word).
The behavior of MS Word is particularly weird: When I open a DOCX file that contains emoji (set with ...
0
votes
1
answer
156
views
If the `ffi` group of characters is smushed together into one character, then why I can select each of them separately?
In the video Why You Can Tweet More In Japanese: What Counts As A Character?, the author says that:
Unicode has a single character for some English ligatures, like "ffi" - notice how the ...
0
votes
1
answer
592
views
Notepad on Windows 11 not respecting Unicode monospace characters
I'm using box-drawing characters to represent trees. I need other people using Windows to see them so I can assume they all have Notepad installed. But even if I select Consolas as the font, here's ...
0
votes
0
answers
32
views
How to add multiple entries to MS-Word "Math AutoCorrect"?
In MS-Word Math AutoCorrect there are entries for all alphabets in Script type as below -
Say, I want to add a similar suite to represent vectors with all the alphabets. Currently I can do it one ...
1
vote
2
answers
453
views
Website that used to display unicode properly no longer does so
This is a problem that I've encountered a couple times recently. Here is my most recent experience:
Trying to browse https://www.scape.sc/release.php?id=48, a page that contains japanese text. The ...
1
vote
0
answers
52
views
CJK unicode characters output from `wine reg query` displayed on terminal are fine. But they become question marks after pipe
When I run wine reg query 'HKEY_CURRENT_USER\Software\Akelsoft\AkelPad\Recent' on my machine, the output is fine and contains something like file0 REG_MULTI_SZ Z:\home\x\怎.txt on my terminal.
...
1
vote
1
answer
244
views
Problems with Word link to Excel when Windows is set to use UTF-8
I've some Word documents that have fields that are automatically updated from an Excel file using Words built-in links to Excel (as in copy > paste special > paste as link).
After setting up a ...
0
votes
1
answer
704
views
Change encoding to Unicode in .csv exported from Gmail
I'd like to copy the typed-in addresses from Gmail to Outlook 2021 (Windows 10).
To do this, I'm using the Google Contacts Export to Outlook CSV function and nk2edit.
The problem is that the special ...
1
vote
1
answer
312
views
Random unicode characters while typing normally in Windows 10
Sometimes when I'm typing in Windows I get some unicode-variant of the character I was trying to type.
for example:
ŕ instead of r
ś instead of s
ḳ instead of k etc.
Things I've already tried:
Try ...
7
votes
4
answers
11k
views
Convert UTF-16 LE to UTF-8 in windows via command line
(question re-written to be more useful)
I have a batch script which will interact with command line programs, take their output, and then perform decisions based on that output.
One of the programs I ...
9
votes
3
answers
3k
views
fixing mis-encoded Chinese in file names
I'm saddled with a bunch of files whose names are garbled beyond recognition. Even though I more or less know what those names originally contained, fixing them by hand would involve a lot of hassle, ...
0
votes
0
answers
47
views
In Excel, how do I handle someone else's workbook that uses UNICODE?
I have been sent an Excel workbook in which I find that a cell that appears to contain the letter "A" produces FALSE if I compare it with "A". That is, if the cell in question were ...
0
votes
0
answers
41
views
Is the dataset that underpins the Windows Character Map utility accessible somewhere on the system?
The Windows Character Map utility displays the glyphs available in the chosen font, along with the Unicode codepoint and a name for a selected character. Is the dataset that underpins this utility ...
2
votes
0
answers
235
views
tmux inside Cygwin mintty - unicode characters broken depending on command
Please help me understand what's going on with some unicode characters looking right inside mintty + tmux in some command outputs, but not in others.
Sample commands:
for c in 00AE 1F007 1F32D 1F603; ...
0
votes
1
answer
352
views
Missing some Unicode glyphs on Windows 7, is there way to fix/patch it?
I have an arrowsbe.txt file, with the following content (in hex):
FE FF 00 41 00 42 00 43 2B A1 00 20 2B A2
You see, it is UTF-16BE encoded, with BOM at file start. There are two not-so-often seen ...
1
vote
0
answers
34
views
Unicode letters replaced by ***** (letter n )
the letter n is being replaced with **** in my console when I run dbt.
In text format:
11:59:24 Fou*****d 397 models, 15 tests, 0 s*****apshots, 0 a*****alyses, 639 macros, 0 operatio*****s, 11 seed ...
0
votes
1
answer
350
views
unicode-escaping with a standard Unix tool
I have cooked up the following python script, it does 1 thing and it does it well. It does, whatever Unicode magic unicode-escape does in python3 to some text.
#!/usr/bin/env python3
import sys
text ...
1
vote
1
answer
767
views
Convert Korean files that are showing up incorrectly to utf-8 - character shows Çѱ¹Ÿî
I was just about to ask this after a long time of searching so decided to answer my own question...
I downloaded Korean subtitles in an .smi file that was in zip archive. When I extracted it, the ...
7
votes
3
answers
8k
views
What is a quick way to write "dagger" sign in MS Word equation mode?
In MS Word equation mode with the Unicode converter, I write symbols by putting the "\" sign and then typing the specific Unicode, i.e., for the summation sign I write "\sum" and ...
3
votes
1
answer
3k
views
How can I set my system's default encoding to UTF-16?
My daily activity involves usage of English, French, Spanish, and when I save personal copies of web pages or other documents, the full character range of those languages finds its way into filenames ...
1
vote
1
answer
222
views
zmv to rename Unicode characters (e.g. u0308 'COMBINING DIAERESIS') on Mac/zsh-shell
While cleaning a file server I found unwanted or non-ASCII characters in many filenames.
To rename unwanted filenames, I normally use the comfortable zmv command in a zsh shell on an OSX machine.
To ...
0
votes
0
answers
550
views
Problems with certain Unicode characters
I am using word, and I needed to insert the name of the Chinese Nüshu script in Nüshu (𛆁𛈬), there wasn't a font that supported it, so I downloaded every font I could find that supported it. Noto ...
2
votes
0
answers
99
views
Is it possible to write the Lovecraftian Sigil of the Gateway in unicode?
It isn't technically copyrighted and doesn't seem to be subject to trademark, and is kind of a cultural symbol, so for all I know there's a hidden symbol assignment for it somewhere... but the Sigil ...
0
votes
1
answer
1k
views
Can't copy chinese text from PDF
I tried to open it with google docs, and tried the site i2orc to convert the file. But in both cases the Chinese characters are missing from the output, just as when i copy/paste them from the ...
2
votes
0
answers
165
views
Not able to copy Korean characters from PDF file using Acrobat Reader
I decided to study Korean and I found a book that I want to copy some vocabulary from, the problem is when I paste the copied text, it turns into gibberish characters. Any idea how to solve this? I ...
5
votes
1
answer
17k
views
Unicode backspace key symbol ⌫
Is there a Unicode symbol for the backspace key ⌫, the x inside a left-pointing arrow?
I know that Unicode has a "BACKSPACE" control character (U+0008) that it inherited from ASCII, and it ...
0
votes
1
answer
346
views
Binding unicode λ symbol (code+glyph) to key combination
I would like to type Unicode λ in a clear way (key L and alt modifier, for example) and not pressing SHIFT+CTRL+u03bb each time (or doing a poor trick like copy/paste).
I've tried a lot of ways to ...
1
vote
1
answer
2k
views
How can I type rough breathing marks in Classical Greek using the Windows 11 Greek Polytonic keyboard?
I'm learning to use the Greek Polytonic keyboard in Windows 11 to type Classical Greek.
I can successfully type the following characters (using α as an example): ᾶ, ἀ, ἄ, ἂ, ά, ά, ἆ, ᾱ. For example, ἄ ...
0
votes
1
answer
73
views
Why can't I print this in windows ( ه҈҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉ ')?
I got this from the following question: What are these ? ه҈҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉҉
I can't get this to print in windows powershell. I tried using chcp 65001. BTW I can print 北京 but not ه҈҉҉҉҉҉҉҉҉҉҉҉...
4
votes
0
answers
2k
views
Why is it so HARD to have Windows 10 CMD console display Unicode correctly?
Before showing the hard(difficult) example of Windows 10 (Win10.21H2 in this case), let me show an easy example by Ubuntu 22.04 .
I place file names with Chinese and Korean characters on a USB flash ...