From: Eli Zaretskii Date: Thu, 12 Dec 2013 18:19:10 +0000 (+0200) Subject: Support MS-Windows file names that use characters outside of ANSI codepage. X-Git-Tag: emacs-24.3.90~173^2^2~42^2~45^2~387^2~446 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=01633a17e74e638f31ec71c3587481f0084574f2;p=emacs.git Support MS-Windows file names that use characters outside of ANSI codepage. src/w32.c (get_file_security, set_file_security) (create_symbolic_link): Separate pointers and boolean flags for ANSI and Unicode APIs. Use the latter if w32_unicode_filenames is non-zero, else the former. (codepage_for_filenames, filename_to_utf16, ) (filename_from_utf16, filename_to_ansi, filename_from_ansi): New functions. (init_user_info): Allow $HOME and $SHELL to include non-ANSI characters. (normalize_filename): Lose the DBCS code, now works on UTF-8. Accept only one argument; all callers changed. (dostounix_filename): Remove the second argument, now works in UTF-8. All callers changed. (parse_root): Lose DBCS code. (get_long_basename, w32_get_short_filename, init_environment) (GetCachedVolumeInformation, sys_readdir, open_unc_volume) (read_unc_volume, logon_network_drive, faccessat, sys_chdir) (sys_chmod, sys_creat, sys_fopen, sys_link, sys_mkdir, sys_open) (sys_rename_replace, sys_rmdir, sys_unlink, stat_worker, utime) (is_symlink, readlink, chase_symlinks, w32_delayed_load): Work in Unicode mode if w32_unicode_filenames is non-zero, in ANSI mode otherwise. (ansi_encode_filename): New function. (get_emacs_configuration, get_emacs_configuration_options): Functions deleted. (add_volume_info, GetCachedVolumeInformation): Run the input file name through unixtodos_filename, to ensure it is stored and referenced in canonical form. (get_volume_info): Lose the DBCS code, now works in UTF-8. (logon_network_drive, sys_link, utime): Improve error handling. (sys_access): New function. (hashval, generate_inode_val): Unused functions deleted. (symlink, readlink, readlinkat): Lose DBCS code, now works in UTF-8. (check_windows_init_file): Convert error message from UTF-8 to ANSI codepage, for display in the message box. (globals_of_w32): Set w32_unicode_filenames according to the OS version. src/w32term.c (construct_drag_n_drop): Work in Unicode mode when w32_unicode_filenames is non-zero, ANSI mode otherwise. (syms_of_w32term): Declare w32-unicode-filenames. src/w32proc.c (new_child, delete_child): Remove code that handled unused pending_deletion and input_file members of the child struct. (create_child, sys_spawnve): Convert all file names to ANSI codepage. Use ANSI APIs explicitly; forcibly fail if any file name cannot be encoded in ANSI codepage. Don't use unixtodos_filename, mirror slashes by hand. (record_infile, record_pending_deletion): Functions deleted. (Fw32_short_file_name): Call w32_get_short_filename instead of GetShortPathName. src/w32notify.c (add_watch): Work in Unicode mode when w32_unicode_filenames is non-zero, ANSI mode otherwise. (Fw32notify_add_watch): Rewrite to avoid using GetFullPathName; instead, do the same with Lisp primitives. src/w32fns.c (file_dialog_callback, Fx_file_dialog) (Fsystem_move_file_to_trash, Fw32_shell_execute) (Ffile_system_info, Fdefault_printer_name): Work in Unicode mode when w32_unicode_filenames is non-zero, ANSI mode otherwise. (Fw32_shell_execute): Improve error reporting. (Fdefault_printer_name): Ifdef away for Cygwin. src/w32.h (struct _child_process): Remove input_file and pending_deletion members that are no longer used. (dostounix_filename, w32_get_short_filename, filename_from_ansi) (filename_to_ansi, filename_from_utf16, filename_to_utf16) (ansi_encode_filename): New and updated prototypes. src/unexw32.c (open_input_file, open_output_file, unexec): Use ANSI APIs explicitly. (unexec): Don't use dostounix_filename, it expects a file name in UTF-8. Instead, mirror backslashes by hand. Convert NEW_NAME to ANSI encoding. src/fileio.c (Ffile_name_directory, file_name_as_directory) (directory_file_name, Fexpand_file_name) (Fsubstitute_in_file_name) [WINDOWSNT]: Adapt to the change in arguments of dostounix_filename. (Fexpand_file_name) [WINDOWSNT]: Convert value of $HOME to UTF-8. use MAX_UTF8_PATH for size of file-name strings. (emacs_readlinkat): Build an explicitly unibyte string for file names. (syms_of_fileio) default-file-name-coding-system>: Mention MS-Windows peculiarities. src/emacs.c (init_cmdargs) [WINDOWSNT]: Convert argv[0] to UTF-8. (main) [WINDOWSNT]: Convert the argv[] elements that are files or directories to UTF-8. (decode_env_path) [WINDOWSNT]: Convert file names taken from the environment, and each element of the input PATH, to UTF-8. src/dired.c (file_attributes): Use build_unibyte_string explicitly to make Lisp strings from user and group names. src/coding.h (ENCODE_FILE, DECODE_FILE): Just call encode_file and decode_file. src/coding.c (decode_file_name, encode_file_name): New functions. src/termcap.c (tgetent): Adapt to the change in arguments of dostounix_filename. src/sysdep.c (sys_subshell) [WINDOWSNT]: Use MAX_UTF8_PATH for file names. src/msdos.c (dostounix_filename, init_environment): Adapt to the change in arguments of dostounix_filename. src/image.c (xpm_load, tiff_load, gif_load, imagemagick_load) [WINDOWSNT]: Encode file names passed to the image libraries in ANSI codepage. src/gnutls.c (Fgnutls_boot): Encode all file names passed to GnuTLS. [WINDOWSNT]: Convert file names to the current ANSI codepage. src/filelock.c (lock_file) [WINDOWSNT]: Adapt to the change in arguments of dostounix_filename. nt/inc/ms-w32.h (MAX_UTF8_PATH): New macro. (opendir, closedir, readdir, seekdir): Redirect to replacement functions. nt/inc/dirent.h: Make d_name[] be MAXNAMELEN*4 characters long. lisp/term/w32-win.el (w32-handle-dropped-file): lisp/startup.el (normal-top-level): lisp/net/browse-url.el (browse-url-file-url): lisp/dnd.el (dnd-get-local-file-name): On MS-Windows, encode and decode file names using 'utf-8' rather than file-name-coding-system. doc/emacs/mule.texi (File Name Coding): Document file-name encoding peculiarities on MS-Windows. doc/lispref/nonascii.texi (Encoding and I/O): Document file-name encoding peculiarities on MS-Windows. etc/NEWS: Mention support on MS-Windows of file names outside of the current locale. Fixes: debbugs:7100 --- 01633a17e74e638f31ec71c3587481f0084574f2 diff --cc doc/emacs/ChangeLog index fa43c3ef53e,5da37003152..c765d479385 --- a/doc/emacs/ChangeLog +++ b/doc/emacs/ChangeLog @@@ -1,7 -1,3 +1,12 @@@ ++2013-12-12 Eli Zaretskii ++ ++ * mule.texi (File Name Coding): Document file-name encoding ++ peculiarities on MS-Windows. ++ +2013-12-12 Glenn Morris + + * emacs.texi: Sync direntry with info/dir version. + 2013-12-08 Juanma Barranquero * msdog.texi (Windows Keyboard): Fix typo. diff --cc doc/lispref/ChangeLog index c224e523a84,d2173793d00..f2b6026fe26 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog @@@ -1,7 -1,3 +1,12 @@@ ++2013-12-12 Eli Zaretskii ++ ++ * nonascii.texi (Encoding and I/O): Document file-name encoding ++ peculiarities on MS-Windows. ++ +2013-12-12 Glenn Morris + + * elisp.texi: Sync direntry with info/dir version. + 2013-12-08 Juanma Barranquero * display.texi (Progress, Face Remapping): diff --cc etc/ChangeLog index 71e86dcbc85,71e86dcbc85..ec86288c34f --- a/etc/ChangeLog +++ b/etc/ChangeLog @@@ -1,3 -1,3 +1,8 @@@ ++2013-12-12 Eli Zaretskii ++ ++ * NEWS: Mention support on MS-Windows of file names outside of the ++ current locale. ++ 2013-11-23 Xue Fuqiao * TODO: Minor update. diff --cc lisp/ChangeLog index 52d4eff3eb6,85776c07a74..c761bce7518 --- a/lisp/ChangeLog +++ b/lisp/ChangeLog @@@ -1,44 -1,3 +1,53 @@@ ++2013-12-12 Eli Zaretskii ++ ++ * term/w32-win.el (w32-handle-dropped-file): ++ * startup.el (normal-top-level): ++ * net/browse-url.el (browse-url-file-url): ++ * dnd.el (dnd-get-local-file-name): On MS-Windows, encode and ++ decode file names using 'utf-8' rather than ++ file-name-coding-system. ++ +2013-12-12 Fabián Ezequiel Gallina + + * progmodes/python.el (python-indent-context) + (python-indent-calculate-indentation): Fix auto-identation + behavior for comment blocks. (Bug#15916) + +2013-12-12 Nathan Trapuzzano (tiny change) + + * progmodes/python.el (python-indent-calculate-indentation): When + determining indentation, don't treat "return", "pass", etc., as + operators when they are just string constituents. (Bug#15812) + +2013-12-12 Juri Linkov + + * uniquify.el (uniquify-buffer-name-style): Change default to + `post-forward-angle-brackets'. + + * menu-bar.el (menu-bar-options-menu): Don't require preloaded + `uniquify'. Change default to `post-forward-angle-brackets'. + +2013-12-11 Glenn Morris + + * emacs-lisp/package.el (finder-list-matches): + Autoload rather than falsely declaring. + +2013-12-11 Teodor Zlatanov + + * net/eww.el (eww-exit, eww-close): Add UI convenience wrappers. + (eww-mode-map): Use them. + +2013-12-11 Martin Rudalics + + * window.el (display-buffer-in-side-window): Fix doc-string + (Bug#16115). + +2013-12-11 Juanma Barranquero + + * vc/vc-git.el: Silence byte-compiler warnings. + (vc-git-dir-extra-headers): Rename arg _dir which is no longer ignored. + (log-edit-set-header): Declare. + 2013-12-11 Eli Zaretskii * Makefile.in (custom-deps, finder-data): Run output file names diff --cc nt/ChangeLog index 3e9f7683b63,3e9f7683b63..958605eb8b8 --- a/nt/ChangeLog +++ b/nt/ChangeLog @@@ -1,3 -1,3 +1,10 @@@ ++2013-12-12 Eli Zaretskii ++ ++ * inc/ms-w32.h (MAX_UTF8_PATH): New macro. ++ (opendir, closedir, readdir, seekdir): Redirect to replacement ++ functions. ++ * inc/dirent.h: Make d_name[] be MAXNAMELEN*4 characters long. ++ 2013-11-27 Glenn Morris * README.W32: diff --cc src/ChangeLog index e0f9b9e8689,89c640bd8c7..839630e93ea --- a/src/ChangeLog +++ b/src/ChangeLog @@@ -1,42 -1,3 +1,167 @@@ ++2013-12-12 Eli Zaretskii ++ ++ Support file names on MS-Windows that use characters outside of ++ the current system codepage. (Bug#7100) ++ ++ * w32.c (get_file_security, set_file_security) ++ (create_symbolic_link): Separate pointers and boolean flags for ++ ANSI and Unicode APIs. Use the latter if w32_unicode_filenames is ++ non-zero, else the former. ++ (codepage_for_filenames, filename_to_utf16, ) ++ (filename_from_utf16, filename_to_ansi, filename_from_ansi): New ++ functions. ++ (init_user_info): Allow $HOME and $SHELL to include non-ANSI ++ characters. ++ (normalize_filename): Lose the DBCS code, now works on UTF-8. ++ Accept only one argument; all callers changed. ++ (dostounix_filename): Remove the second argument, now works in ++ UTF-8. All callers changed. ++ (parse_root): Lose DBCS code. ++ (get_long_basename, w32_get_short_filename, init_environment) ++ (GetCachedVolumeInformation, sys_readdir, open_unc_volume) ++ (read_unc_volume, logon_network_drive, faccessat, sys_chdir) ++ (sys_chmod, sys_creat, sys_fopen, sys_link, sys_mkdir, sys_open) ++ (sys_rename_replace, sys_rmdir, sys_unlink, stat_worker, utime) ++ (is_symlink, readlink, chase_symlinks, w32_delayed_load): Work in ++ Unicode mode if w32_unicode_filenames is non-zero, in ANSI mode ++ otherwise. ++ (ansi_encode_filename): New function. ++ (get_emacs_configuration, get_emacs_configuration_options): ++ Functions deleted. ++ (add_volume_info, GetCachedVolumeInformation): Run the input file ++ name through unixtodos_filename, to ensure it is stored and ++ referenced in canonical form. ++ (get_volume_info): Lose the DBCS code, now works in UTF-8. ++ (logon_network_drive, sys_link, utime): Improve error handling. ++ (sys_access): New function. ++ (hashval, generate_inode_val): Unused functions deleted. ++ (symlink, readlink, readlinkat): Lose DBCS code, now works in UTF-8. ++ (check_windows_init_file): Convert error message from UTF-8 to ++ ANSI codepage, for display in the message box. ++ (globals_of_w32): Set w32_unicode_filenames according to the OS ++ version. ++ ++ * w32term.c (construct_drag_n_drop): Work in Unicode mode when ++ w32_unicode_filenames is non-zero, ANSI mode otherwise. ++ (syms_of_w32term): Declare w32-unicode-filenames. ++ ++ * w32proc.c (new_child, delete_child): Remove code that handled ++ unused pending_deletion and input_file members of the child struct. ++ (create_child, sys_spawnve): Convert all file names to ANSI ++ codepage. Use ANSI APIs explicitly; forcibly fail if any file ++ name cannot be encoded in ANSI codepage. Don't use ++ unixtodos_filename, mirror slashes by hand. ++ (record_infile, record_pending_deletion): Functions deleted. ++ (Fw32_short_file_name): Call w32_get_short_filename instead of ++ GetShortPathName. ++ ++ * w32notify.c (add_watch): Work in Unicode mode when ++ w32_unicode_filenames is non-zero, ANSI mode otherwise. ++ (Fw32notify_add_watch): Rewrite to avoid using GetFullPathName; ++ instead, do the same with Lisp primitives. ++ ++ * w32fns.c (file_dialog_callback, Fx_file_dialog) ++ (Fsystem_move_file_to_trash, Fw32_shell_execute) ++ (Ffile_system_info, Fdefault_printer_name): Work in Unicode mode ++ when w32_unicode_filenames is non-zero, ANSI mode otherwise. ++ (Fw32_shell_execute): Improve error reporting. ++ (Fdefault_printer_name): Ifdef away for Cygwin. ++ ++ * w32.h (struct _child_process): Remove input_file and ++ pending_deletion members that are no longer used. ++ (dostounix_filename, w32_get_short_filename, filename_from_ansi) ++ (filename_to_ansi, filename_from_utf16, filename_to_utf16) ++ (ansi_encode_filename): New and updated prototypes. ++ ++ * unexw32.c (open_input_file, open_output_file, unexec): Use ANSI ++ APIs explicitly. ++ (unexec): Don't use dostounix_filename, it expects a file name in ++ UTF-8. Instead, mirror backslashes by hand. Convert NEW_NAME to ++ ANSI encoding. ++ ++ * fileio.c (Ffile_name_directory, file_name_as_directory) ++ (directory_file_name, Fexpand_file_name) ++ (Fsubstitute_in_file_name) [WINDOWSNT]: Adapt to the change in ++ arguments of dostounix_filename. ++ (Fexpand_file_name) [WINDOWSNT]: Convert value of $HOME to UTF-8. ++ use MAX_UTF8_PATH for size of file-name strings. ++ (emacs_readlinkat): Build an explicitly unibyte string for file ++ names. ++ (syms_of_fileio) ++ default-file-name-coding-system>: Mention MS-Windows peculiarities. ++ ++ * emacs.c (init_cmdargs) [WINDOWSNT]: Convert argv[0] to UTF-8. ++ (main) [WINDOWSNT]: Convert the argv[] elements that are files or ++ directories to UTF-8. ++ (decode_env_path) [WINDOWSNT]: Convert file names taken from the ++ environment, and each element of the input PATH, to UTF-8. ++ ++ * dired.c (file_attributes): Use build_unibyte_string explicitly ++ to make Lisp strings from user and group names. ++ ++ * coding.h (ENCODE_FILE, DECODE_FILE): Just call encode_file and ++ decode_file. ++ ++ * coding.c (decode_file_name, encode_file_name): New functions. ++ ++ * termcap.c (tgetent): Adapt to the change in arguments of ++ dostounix_filename. ++ ++ * sysdep.c (sys_subshell) [WINDOWSNT]: Use MAX_UTF8_PATH for file ++ names. ++ ++ * msdos.c (dostounix_filename, init_environment): Adapt to the ++ change in arguments of dostounix_filename. ++ ++ * image.c (xpm_load, tiff_load, gif_load, imagemagick_load) ++ [WINDOWSNT]: Encode file names passed to the image libraries in ++ ANSI codepage. ++ ++ * gnutls.c (Fgnutls_boot): Encode all file names passed to GnuTLS. ++ [WINDOWSNT]: Convert file names to the current ANSI codepage. ++ ++ * filelock.c (lock_file) [WINDOWSNT]: Adapt to the change in ++ arguments of dostounix_filename. ++ +2013-12-12 Dmitry Antipov + + * font.h (struct font_entity) [HAVE_NS]: New field to record + font driver which was used to create this entity. + (struct font) [HAVE_WINDOW_SYSTEM]: New field to record + frame where the font was opened. + (font_close_object): Add prototype. + * font.c (font_make_entity) [HAVE_NS]: Zero out driver field. + (font_close_object): Not static any more. Lost frame arg. + Adjust comment and users. + * alloc.c (cleanup_vector): Call font_close_object to adjust + per-frame font counters correctly. If HAVE_NS, also call + driver-specific cleanup for font-entity objects. + * ftfont.c (ftfont_open): + * nsfont.m (nsfont_open): + * w32font.c (w32font_open_internal): + * xfont.c (xfont_open): + * xftfont.c (xftfont_open): Save frame pointer in font object. + * macfont.m (macfont_open): Likewise. + (macfont_descriptor_entity): Save driver pointer to be able + to call its free_entity routine when font-entity is swept. + * ftxfont.c (ftxfont_open): Add eassert because frame + pointer should be saved by ftfont_driver.open. + +2013-12-12 Dmitry Antipov + + * xterm.c (x_make_frame_visible): Restore hack which is needed when + input polling is used. This is still meaningful for Cygwin, see + http://lists.gnu.org/archive/html/emacs-devel/2013-12/msg00351.html. + * keyboard.c (poll_for_input_1, input_polling_used): Define + unconditionally. + * dispextern.h (FACE_SUITABLE_FOR_CHAR_P): Remove unused macro. + (FACE_FOR_CHAR): Simplify because face_for_char does the same. + * fontset.c (face_suitable_for_char_p) [0]: Remove unused function. + (font_for_char): Prefer ptrdiff_t to int for buffer position. + (face_for_char): Likewise. Rearrange eassert and return ASCII + face for CHAR_BYTE8_P. + * fontset.h (font_for_char, face_for_char): Adjust prototypes. + 2013-12-11 Ken Brown * dispextern.h (erase_phys_cursor): diff --cc src/fileio.c index a0603b490d9,2ef3f1fe0f9..8fb6129885c --- a/src/fileio.c +++ b/src/fileio.c @@@ -1101,7 -1102,8 +1102,7 @@@ filesystem tree, not (expand-file-name #ifdef DOS_NT /* Make sure directories are all separated with /, but avoid allocation of a new string when not required. */ - dostounix_filename (nm, multibyte); - /* FIXME: Figure out multibyte and downcase here. */ + dostounix_filename (nm); #ifdef WINDOWSNT if (IS_DIRECTORY_SEP (nm[1])) { @@@ -1479,7 -1497,8 +1496,7 @@@ target[1] = ':'; } result = make_specified_string (target, -1, o - target, multibyte); - dostounix_filename (SSDATA (result), multibyte); - /* FIXME: Figure out the multibyte and downcase here. */ + dostounix_filename (SSDATA (result)); #ifdef WINDOWSNT if (!NILP (Vw32_downcase_file_names)) result = Fdowncase (result); @@@ -1763,7 -1782,8 +1780,7 @@@ those `/' is discarded. */ nm = xlispstrdupa (filename); #ifdef DOS_NT - dostounix_filename (nm, multibyte); - /* FIXME: Figure out multibyte and downcase. */ + dostounix_filename (nm); substituted = (memcmp (nm, SDATA (filename), SBYTES (filename)) != 0); #endif endp = nm + SBYTES (filename);