In preparation for my next studies, I have written a short program which extracts rhythmic word sequences such as Tanka or Haiku from a plain text.
As you may know, Japanese Haiku is composed of 5, 7, and 5 syllables,
Japanese Tanka is composed of 5, 7, 5, 7, 7 syllables,
and Japanese Dodoitsu is composed of 7, 7, 7, 5 syllables.
The program below extracts word sequences that may sound like Haiku or Tanka or Dodoitsu and so on from a plain text according to the rhythm pattern designated by the user.
My ex-student has informed me that such a bot that tweets Tanka-like phrases already exists in the internet, but since it may be interesting that we can test our own program with our own texts as my ex-student also says, I show my program, here. (Sorry, the program works only for Japanese texts.)
import MeCab
import sys
import os
def rhythmicDocument(rhythm_pattern, document):
tagger = MeCab.Tagger()
tagger.parse("") #parse once a null string to avoid the error caused by garbage collection
node = tagger.parseToNode(document)
buffered_node = node
rhythmic_text = ''
while node:
node = buffered_node
# print('buffered_node starts with', buffered_node.surface)
first_success = False
for pattern in rhythm_pattern:
# print('Now Searching a phrase whose length = ', pattern)
candidate_phrase = ''
candidate_length = 0
success = False
abandon = False
head = True
temp_node = node
while not success:
tango = temp_node.surface
meta = temp_node.feature.split(",")
hinshi = meta[0]
# print('Now checking the word: ', tango)
# print(' Hinshi is ', hinshi)
if len(meta) > 8:
yomi = meta[-1]
nagasa = len(yomi) - yomi.count('ャ') - yomi.count('ュ') - yomi.count('ョ')
else:
abandon = True
# print('ABaondon because of no yomi.')
# print(' Nagasa is ', nagasa)
if hinshi == "記号":
abandon = True
# print('Abandon because this is 記号.')
if head:
if hinshi not in ["名詞", "動詞", "形容詞", "形容動詞","副詞"]:
abandon = True
# print('Abandon')
if not abandon:
if nagasa <= pattern:
candidate_phrase += tango
candidate_length += nagasa
# print(' candidate_phrase = ', candidate_phrase)
# print(' candidate_length = ', candidate_length)
head = False
if candidate_length == pattern:
success = True
if not first_success:
first_success = True
buffered_node = node.next
node = temp_node.next
if candidate_phrase[-1] == 'っ':
candidate_phrase += node.surface[0]
rhythmic_text += candidate_phrase + "\n"
# print('SUCCESS! The new phrase is ', candidate_phrase)
break
elif candidate_length > pattern:
abandon = True
# print('ABANDON!')
else:
abandon = True
# print('Abandon!')
if abandon:
candidate_phrase = ''
candidate_length = 0
node = node.next
temp_node = node
head = True
abandon = False
else:
temp_node = temp_node.next
if not temp_node:
candidate_phrase = ''
candidate_length = 0
node = node.next
temp_node = node
head = True
abandon = False
if not node:
return rhythmic_text
rhythmic_text += '\n'
return rhythmic_text
def convert(patternfilename, documentfilename):
with open(documentfilename, "r") as documentfile:
document = documentfile.read()
with open(patternfilename, "r") as rhythmfile:
rhythm_spec = rhythmfile.readline().strip()
try:
rhythm_pattern = list(map(int, list(rhythm_spec)))
except Exception as e:
print('Some error has occured while reading the rhythm spec file.')
print('The rhythm specification should consist of only numbers.')
rhythmic_document = rhythmicDocument(rhythm_pattern, document)
return rhythmic_document
def convertfile(patternfilename, documentfilename, outputfilename):
rhythmic_document = convert(patternfilename, documentfilename)
with open(outputfilename, "w") as outputfile:
outputfile.write(rhythmic_document)
outputfile.write('\n')
if __name__ == '__main__':
import sys
argc = len(sys.argv)
# print('argc = ', argc)
if argc == 4 :
patternfilename = sys.argv[1]
documentfilename = sys.argv[2]
outputfilename = sys.argv[3]
convertfile(patternfilename, documentfilename, outputfilename)
else:
print('usage: python3 thistestprogram.py pattern_file input_file output_file')
Installation:
You should first install python3, MeCab, and mecab-python3.
I hope you can easily find how to do this in the internet.
Usage:
Copy and paste the above program into a file named like thistestprogram.py.
Prepare a file to indicate the rhythmic pattern you want. For example,
% echo "575" > pattern575.txt
Then just run like below.
% python3 thistestprogram.py pattern575.txt filename_of_any_text_file_you_have temp_output_file.txt
Eample:
Applying the program to my old paper, I have gotten the following results.
In case of 57577:
人間と
機械の両者
合わさって
構成されて
知的活動
文学も
人工知能
研究の
一部になって
人工知能
芸術も
人工知能
研究の
一部になって
人工知能
法学も
人工知能
研究の
一部になって
人工知能
In case of 7775:
製品の夢
人工知能
ネットビジネス
誰のため
相手の心
読み取ることは
自分の心
探索に
与えていない
心とは言え
言わばまぼろし
心なの
To the extent possible under law,
the person who associated CC0
with this work has waived all copyright and related or neighboring
rights to this work.
Related entries (automatically calculated):
Unicode decode error "'utf-8' codec can't decode byte 0xfa in position 0: invalid start byte" when using MeCab
Using Python on Windows
Aligning Facebook button and Twitter button
Redirecting URL in Ruby on Rails
On This Day: Atomic Bomb Dropped on Nagasaki
css <pre> and <code> for mobile devices
Using unicode characters in Windows command line
Showing the favicon in Google search results
Login window freezes when making VNC connection from Windows to Mac
Mechanical engineers and electrical engineers have different mental models of oscillation
UAV/UGV Autonomous Cooperation
AI (Artificial Intelligence) and Philosophy
Culture as the base of our country: Prof. Inose
Toward AI-embedded Society where AI is Not Recognized as AI
Difference between Science and Engineering
UNESCO: `Do you know AI or AI knows you better? Thinking Ethics of AI'
What an old AI researcher thinks after watching the movie "Green Book" - about Racism, Discrimination, and AI (Artificial Intelligence)
Civilization, Culture, Science, and Technology
Koichi Hori Top page
AI ELSI Award
The University of Tokyo Academic Archives Portal - UTokyo Digital Collections
What is Artificial Intelligence?
Koichi Hori: Last Lecture
AI support for Ethical AI Design
Koichi Hori