I am using mecab-python3 - morphological analyzer for Japanese text.
I do not know why but mecab-python3 sometimes causes the error "'utf-8' codec can't decode byte 0xfa in position 0: invalid start byte".
Searching on internet, I have found no answer about the cause of the error, but some people say that we can avoid this error by parsing a null string before carrying out parsing tasks.
I have tried this workaround and have found this certainly works.
Here is an example:
import MeCab
def extractNouns(text):
tagger = MeCab.Tagger()
normallyprocessed = True
tagger.parse("")
# No one seems to know why this works,
# but this tagger.parse("") can avoid the unicode decoding error
# in the following parsing.
node = tagger.parseToNode(text)
keywords = []
while node:
try:
word = node.surface
except Exception as e:
print(str(e))
print('parsing error occured but ignored')
normallyprocessed = False
if normallyprocessed and word.isalpha():
meta = node.feature.split(",")
if meta[0] == '名詞':
keywords.append(word)
node = node.next
normallyprocessed = True
return keywords
To the extent possible under law,
the person who associated CC0
with this work has waived all copyright and related or neighboring
rights to this work.
Related entries (automatically calculated):
Using unicode characters in Windows command line
css <pre> and <code> for mobile devices
Redirecting URL in Ruby on Rails
Login window freezes when making VNC connection from Windows to Mac
Showing the favicon in Google search results
Using Python on Windows
Aligning Facebook button and Twitter button
A small program which extracts rhythmic word sequences such as Tanka(57577) or Haiku(575) from a plain text
On This Day: Atomic Bomb Dropped on Nagasaki
Mechanical engineers and electrical engineers have different mental models of oscillation
UAV/UGV Autonomous Cooperation
UNESCO: `Do you know AI or AI knows you better? Thinking Ethics of AI'
Koichi Hori: Last Lecture
Toward AI-embedded Society where AI is Not Recognized as AI
AI support for Ethical AI Design
What an old AI researcher thinks after watching the movie "Green Book" - about Racism, Discrimination, and AI (Artificial Intelligence)
Culture as the base of our country: Prof. Inose
AI ELSI Award
AI (Artificial Intelligence) and Philosophy
The University of Tokyo Academic Archives Portal - UTokyo Digital Collections
Difference between Science and Engineering
Civilization, Culture, Science, and Technology
Koichi Hori Top page
What is Artificial Intelligence?
Koichi Hori