1

Typically, my programs are creating files with Extended Latin, CJK or Arabic characters in file names without issues and according to input data.

I was under impression that only officially named ASCII characters (:, \, *, ...) cannot be used for file names and I sanitize them. But recently I found that also character U+10FC0C (􏰋) from Unicode's Supplementary Private Use Area-B is not accepted in a new file name or directory name.

Is there an extended standard document which also includes which Unicode characters (beyond ASCII range) are prohibited in file names?

My current System locale is set to Slovak, which commonly goes beyond 7-bit ASCII characters. The file system in question is NTFS.

2
  • What OS (and what version) are you using? How old is the NTFS volume in question? Commented Jan 14, 2022 at 6:41
  • @user1686 – My apologies, I forgot the tag! Now it is added.
    – miroxlav
    Commented Jan 14, 2022 at 7:20

1 Answer 1

6

See the following restrictions for the File and Directory Naming Conventions in the Microsoft documentation:

  • Use any character in the current code page for a name, including Unicode characters and characters in the extended character set
    (128–255), except for the following:
  • The following reserved characters:

    • < (less than)
    • > (greater than)
    • : (colon)
    • " (double quote)
    • / (forward slash)
    • \ (backslash)
    • | (vertical bar or pipe)
    • ? (question mark)
    • * (asterisk)
  • Integer value zero, sometimes referred to as the ASCII NUL character.

  • Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed. For more information about file streams, see File Streams.

  • Any other character that the target file system does not allow.

Read the following article, please: Private Use Areas.

This is mean the character U+10FC0C you are asking by definition cannot be considered as standardized characters in Unicode itself. Therefore, can't be used for file name in the NTFS.

In the answer above it related to the latest restriction:

  • Any other character that the target file system does not allow.

2
  • Please help me understand how this applies to character U+10FC0C mentioned in the question. Currently I do not see the explanation so I cannot accept or even upvote this.
    – miroxlav
    Commented Jan 14, 2022 at 7:21
  • @miroxlav: Read the following article attentively, please: Private Use Areas. This is mean the character U+10FC0C by definition cannot be considered as standardized characters in Unicode itself. Therefore, can't be used for file name in the NTFS. In the answer above it related to the latest restriction: Any other character that the target file system does not allow.
    – Jackdaw
    Commented Jan 14, 2022 at 7:30

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .