mozilla

Revision 129362 of Encodings for localization files

  • Revision slug: Encodings_for_localization_files
  • Revision title: Encodings for localization files
  • Revision id: 129362
  • Created:
  • Creator: stasm
  • Is current revision? Yes
  • Comment 8 words removed

Revision Content

When creating a localization for Mozilla products, it’s important to be aware of the encoding of the files that you generate.

In general, files in the Mozilla repositories are UTF-8 encoded. There are a few exceptions, though.

Installer

The Windows installer can’t handle UTF-8, but only the codepages provided by Windows. This is tricky to hook up in the build process, so here it goes:

File Encoding Notes
toolkit/installer/windows/charset.mk ASCII The WIN_INSTALLER_CHARSET variable must be set to an encoding which matches toolkit/installer/windows/install.it CHARSET= parameter. See the table below for appropriate values.
toolkit/installer/windows/install.it A Windows codepage. This must match the CHARSET= parameter in this file, and the WIN_INSTALLER_CHARSET parameter in charset.mk The FONTNAME/FONTSIZE/CHARSET parameters in this file must be set to good values. For most Western scripts, ‘MS Sans Serif’ and ‘8’ are good defaults for the font settings. Eastern scripts will need to choose appropriate fonts that are shipped with Windows. See the table below for appropriate values for the CHARSET= parameter.
browser/installer/installer.inc UTF-8  
toolkit/installer/unix/install.it UTF-8 {{ Deprecated_inline() }}

Native Windows encodings

The following table lists native Windows encodings, and the WIN_INSTALLER_CHARSET and CHARSET= values for each:

Encoding Name WIN_INSTALLER_CHARSET (charset.mk) CHARSET= (windows/install.it)
ANSI_CHARSET CP1252 0
BALTIC_CHARSET CP1257 186
CHINESEBIG5_CHARSET CP950 136
EASTEUROPE_CHARSET CP1250 238
GB2312_CHARSET CP936 134
GREEK_CHARSET CP1253 161
HANGUL_CHARSET CP949 129
RUSSIAN_CHARSET CP1251 204
SHIFTJIS_CHARSET CP932 128
TURKISH_CHARSET CP1254 162
VIETNAMESE_CHARSET CP1258 163
Middle East language editions of Windows:
ARABIC_CHARSET CP1256 178
HEBREW_CHARSET CP1255 177
Thai language editions of Windows:
THAI_CHARSET CP874 222

{{ languages( { "ja": "ja/Encodings_for_localization_files" } ) }}

Revision Source

<p>When creating a localization for Mozilla products, it’s important to be aware of the encoding of the files that you generate.</p>
<p>In general, files in the Mozilla repositories are UTF-8 encoded. There are a few exceptions, though.</p>
<h3 id="Installer" name="Installer">Installer</h3>
<p>The Windows installer can’t handle UTF-8, but only the codepages provided by Windows. This is tricky to hook up in the build process, so here it goes:</p>
<table class="fullwidth-table"> <tbody> <tr> <td class="header">File</td> <td class="header">Encoding</td> <td class="header">Notes</td> </tr> <tr> <td>toolkit/installer/windows/charset.mk</td> <td>ASCII</td> <td>The WIN_INSTALLER_CHARSET variable must be set to an encoding which matches toolkit/installer/windows/install.it CHARSET= parameter. See the table below for appropriate values.</td> </tr> <tr> <td>toolkit/installer/windows/install.it</td> <td>A Windows codepage. This must match the CHARSET= parameter in this file, and the WIN_INSTALLER_CHARSET parameter in charset.mk</td> <td>The FONTNAME/FONTSIZE/CHARSET parameters in this file must be set to good values. For most Western scripts, ‘MS Sans Serif’ and ‘8’ are good defaults for the font settings. Eastern scripts will need to choose appropriate fonts that are shipped with Windows. See the table below for appropriate values for the CHARSET= parameter.</td> </tr> <tr> <td>browser/installer/installer.inc</td> <td>UTF-8</td> <td> </td> </tr> <tr> <td>toolkit/installer/unix/install.it</td> <td>UTF-8</td> <td>{{ Deprecated_inline() }}</td> </tr> </tbody>
</table>
<h4 id="Native_Windows_encodings" name="Native_Windows_encodings">Native Windows encodings</h4>
<p>The following table lists native Windows encodings, and the WIN_INSTALLER_CHARSET and CHARSET= values for each:</p>
<table class="standard-table"> <tbody> <tr> <td class="header">Encoding Name</td> <td class="header">WIN_INSTALLER_CHARSET (charset.mk)</td> <td class="header">CHARSET= (windows/install.it)</td> </tr> <tr> <td>ANSI_CHARSET</td> <td>CP1252</td> <td>0</td> </tr> <tr> <td>BALTIC_CHARSET</td> <td>CP1257</td> <td>186</td> </tr> <tr> <td>CHINESEBIG5_CHARSET</td> <td>CP950</td> <td>136</td> </tr> <tr> <td>EASTEUROPE_CHARSET</td> <td>CP1250</td> <td>238</td> </tr> <tr> <td>GB2312_CHARSET</td> <td>CP936</td> <td>134</td> </tr> <tr> <td>GREEK_CHARSET</td> <td>CP1253</td> <td>161</td> </tr> <tr> <td>HANGUL_CHARSET</td> <td>CP949</td> <td>129</td> </tr> <tr> <td>RUSSIAN_CHARSET</td> <td>CP1251</td> <td>204</td> </tr> <tr> <td>SHIFTJIS_CHARSET</td> <td>CP932</td> <td>128</td> </tr> <tr> <td>TURKISH_CHARSET</td> <td>CP1254</td> <td>162</td> </tr> <tr> <td>VIETNAMESE_CHARSET</td> <td>CP1258</td> <td>163</td> </tr> <tr> <td colspan="3"><strong>Middle East language editions of Windows</strong>:</td> </tr> <tr> <td>ARABIC_CHARSET</td> <td>CP1256</td> <td>178</td> </tr> <tr> <td>HEBREW_CHARSET</td> <td>CP1255</td> <td>177</td> </tr> <tr> <td colspan="3"><strong>Thai language editions of Windows</strong>:</td> </tr> <tr> <td>THAI_CHARSET</td> <td>CP874</td> <td>222</td> </tr> </tbody>
</table>
<p>{{ languages( { "ja": "ja/Encodings_for_localization_files" } ) }}</p>
Revert to this revision