[Python] pandas read_csv issue regarding non-ascii filename
(textanal3664) D:\Users\daewon\Downloads\crime>python ana.py
Traceback (most recent call last):
File "ana.py", line 5, in <module>
df = pd.read_csv('2000년.csv', encoding='euc-kr')
File "D:\PythonEnvs\textanal3664\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "D:\PythonEnvs\textanal3664\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "D:\PythonEnvs\textanal3664\lib\site-packages\pandas\io\parsers.py", line 787, in __init__
self._make_engine(self.engine)
File "D:\PythonEnvs\textanal3664\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "D:\PythonEnvs\textanal3664\lib\site-packages\pandas\io\parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 697, in pandas._libs.parsers.TextReader._setup_parser_source
OSError: Initializing from file failed
----
# ana.py
import pandas as pd
df = pd.read_csv('2000년.csv', encoding='euc-kr')
print(df)
----
Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.23.4'
When the filename contains Korean characters, read_csv raises OSError exception. This occurs on python 3.6.3 64bit with pandas 0.23.4. By setting engine option to python ( read_csv(filename, engine='python') ), you can work around this issue. Default c parser seems to have bug on handling non-ascii filenames.
via : http://kkckc.tistory.com/187 , http://own-search-and-study.xyz/2017/04/08/python3-6%E3%81%AEpandas%E3%81%A7%E3%80%8Cinitializing-from-file-failed%E3%80%8D%E3%81%8C%E8%B5%B7%E3%81%8D%E3%81%9F%E5%A0%B4%E5%90%88%E3%81%AE%E5%AF%BE%E7%AD%96/