Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

The character-encoding tag discusses the general concept of character-set encodings
The unicode character set can be represented in a variety of encodings, one of which is UTF-8
The ascii character set and encoding it generalizes
Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4

22178 questions

1342

votes

13 answers

UTF-8 all the way through

I'm setting up a new server and want to support UTF-8 fully in my web application. I have tried this in the past on existing servers and always seem to end up having to fall back to ISO-8859-1. Where exactly do I need to set the encoding/charsets?…

php mysql linux apache utf-8

asked Nov 10 '08 at 21:04

mercutio

22,151
10
36
37

1299

votes

9 answers

What's the difference between utf8_general_ci and utf8_unicode_ci?

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?

mysql unicode utf-8 collation character-set

asked Apr 20 '09 at 03:43

KahWee Teng

13,658
3
21
21

1059

votes

22 answers

What's the difference between UTF-8 and UTF-8 with BOM?

What's different between UTF-8 and UTF-8 with BOM? Which is better?

unicode utf-8 character-encoding byte-order-mark

asked Feb 08 '10 at 18:26

simple

10,723
3
17
11

859

votes

12 answers

Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence

Sample code (in a REPL): import json json_string = json.dumps("ברי צקלה") print(json_string) Output: "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4" The problem: it's not human readable. My (smart) users want to verify or even edit text files with…

python json unicode utf-8 escaping

asked Aug 20 '13 at 14:18

Berry Tsakala

15,313
12
57
80

724

votes

18 answers

What is the difference between UTF-8 and Unicode?

I have heard conflicting opinions from people - according to the Wikipedia UTF-8 page. They are the same thing, aren't they? Can someone clarify?

unicode encoding utf-8 character-encoding terminology

asked Mar 13 '09 at 17:06

sarsnake

26,667
58
180
286

644

votes

36 answers

Excel to CSV with UTF8 encoding

I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also…

excel encoding csv utf-8

asked Nov 19 '10 at 00:48

Jeff Treuting

13,910
8
36
47

641

votes

14 answers

UTF-8, UTF-16, and UTF-32

What are the differences between UTF-8, UTF-16, and UTF-32? I understand that they will all store Unicode, and that each uses a different number of bytes to represent a character. Is there an advantage to choosing one over the other?

unicode utf-8 utf-16 utf utf-32

asked Jan 30 '09 at 17:05

user60456

613

votes

31 answers

Is it possible to force Excel recognize UTF-8 CSV files automatically?

I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic…

excel csv utf-8

asked May 14 '11 at 13:53

Lyubomyr Shaydariv

20,327
12
64
105

605

votes

21 answers

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools…

text unicode utf-8 character-set

asked Sep 15 '08 at 17:21

Antti Kissaniemi

18,944
13
54
47

594

votes

7 answers

Why does modern Perl avoid UTF-8 by default?

I wonder why most modern solutions built using Perl don't enable UTF-8 by default. I understand there are many legacy problems for core Perl scripts, where it may break things. But, from my point of view, in the 21st century, big new projects (or…

perl unicode utf-8

asked May 28 '11 at 15:12

w.k

8,218
4
32
55

506

votes

5 answers

What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL? I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I'm curious to know whats the difference of utf8mb4 group of encodings with other encoding types defined in MySQL…

mysql encoding utf-8 character-encoding utf8mb4

asked May 06 '15 at 10:45

Mojtaba Rezaeian

8,268
8
31
54

497

votes

8 answers

What is the difference between UTF-8 and ISO-8859-1?

utf-8 character-encoding iso-8859-1

asked Aug 13 '11 at 05:21

Jagadesh

6,489
8
29
30

483

votes

9 answers

What are Unicode, UTF-8, and UTF-16?

What's the basis for Unicode and why the need for UTF-8 or UTF-16? I have researched this on Google and searched here as well, but it's not clear to me. In VSS, when doing a file comparison, sometimes there is a message saying the two files have…

unicode encoding utf-8 utf-16

asked Feb 11 '10 at 00:12

SoftwareGeek

15,234
19
61
78

460

votes

2 answers

Working with UTF-8 encoding in Python source

Consider: $ cat bla.py u = unicode('d…') s = u.encode('utf-8') print s $ python bla.py File "bla.py", line 1 SyntaxError: Non-ASCII character '\xe2' in file bla.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html…

python encoding utf-8 character-encoding

asked Jun 09 '11 at 07:29

Nullpoet

10,949
20
48
65

410

votes

14 answers

Unicode (UTF-8) reading and writing to files in Python

I'm having some brain failure in understanding reading and writing text to a file (Python 2.4). # The string, which has an a-acute in it. ss = u'Capit\xe1n' ss8 = ss.encode('utf8') repr(ss), repr(ss8) ("u'Capit\xe1n'", "'Capit\xc3\xa1n'") print…

python unicode utf-8 io

asked Jan 29 '09 at 15:01

Gregg Lind

20,690
15
67
81

2 3

…

99 100 Next