0%

f**k unipus

Posted on 2020-04-30 Edited on 2021-05-09 In misc

Version 1.1

README

忽然想起百度有语音识别的api，果断就用了
这次有依赖ffmpeg
APIKey和SecretKey自己去百度大脑申请一个语音识别的应用就可以了http://ai.baidu.com/
效果的话…还ok.百度的识别是不带标点的，也就是要自己断句咯

don't worry madam I'll get you there in time it looks as if most people 
have already gone home today I hope so the plane leaves at seven I don't 
want to arrive in London too late or the hotel will think I'm not coming 
and I might lose my room question where is the man taking the woman to

目前存在的不足：

过长的文件无法翻译(这个我将会在下面的版本修复^^，待我了解一下音频格式再说)

——20200430

CODE

from bs4 import BeautifulSoup
import re, requests
import os
import wave
import requests
import time
import base64

base_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s"
APIKey = "***"
SecretKey = "***" 

HOST = base_url % (APIKey, SecretKey)


def getToken(host):
    res = requests.post(host)
    return res.json()['access_token']

def get_audio(file):
    with open(file, 'rb') as f:
        data = f.read()
    return data


def speech2text(FILEPATH, dev_pid=1737):
    token = getToken(HOST)
    speech_data = get_audio(FILEPATH)

    FORMAT = 'wav'
    RATE = '16000'
    CHANNEL = 1
    CUID = '*******'
    SPEECH = base64.b64encode(speech_data).decode('utf-8')

    data = {
        'format': FORMAT,
        'rate': RATE,
        'channel': CHANNEL,
        'cuid': CUID,
        'len': len(speech_data),
        'speech': SPEECH,
        'token': token,
        'dev_pid':dev_pid
    }
    url = 'https://vop.baidu.com/server_api'
    headers = {'Content-Type': 'application/json'}
    # r=requests.post(url,data=json.dumps(data),headers=headers)
    
    r = requests.post(url, json=data, headers=headers)
    Result = r.json()
    if 'result' in Result:
        return Result['result'][0]
    else:
        return Result

def main():
    url = "index.html"
    url_voice2text = "https://app.xunjiepdf.com/voice2text/"
    url_translate = "https://translate.google.cn/"
    soup = BeautifulSoup(open(url, encoding='utf-8'))
    allres = soup.find_all(name='div',attrs={"class":"itest-hear-reslist"})
    # print(soup)
    counter = 1
    for res in allres:
        st = res.span.text;
        st = str(st)[1:-1]
        st = st.split(',')
        for est in st:
            if(re.match(r'.*\.mp3.*', est)):
                if(re.match(r'.*question\.mp3+.*', est)):
                    continue
                print(est[1:-1])
                download_addr = est[1:-1]
                print('Downloading...')
                f=requests.get(download_addr)
                with open(str(counter)+".mp3","wb") as code:
                    code.write(f.content)
                print('Downloaded')
                print('Transcoding mp3 to pcm...')
                os.system("ffmpeg -y -i " + str(counter) + ".mp3 -acodec pcm_s16le -f s16le -ac 1 -ar 16000 "  + str(counter) + ".pcm" )
                print('Transcoded mp3 to pcm')
                print('Recognizing...')
                result = speech2text(FILEPATH=str(counter)+".pcm")
                print('-'*48)
                print(result)
                with open(str(counter)+".txt", "w") as ff:
                    print(result, file=ff)
                print('-'*48)
                print('Recognized')
                counter += 1
                print('Finished')

        # print(i.span.text)
    # print(allres)
if __name__ == '__main__':
    main()

REFERENCE

ffmpeg官方文档, http://ffmpeg.org/ffmpeg.html
百度语音识别api文档, https://ai.baidu.com/ai-doc/SPEECH/Vk38lxily
百度语音识别api代码示例, https://github.com/Baidu-AIP/speech-demo
python做音频格式转换, https://blog.csdn.net/pj_developer/article/details/72778792

Version 1.0

README

我们英语老师布置的题也太j8多了，70多道题，20多道听力，我都听吐了
U校园单元测试还不能查题
而且U校园只能听一次听力，然后偶然按了一下F12发现了听力链接就在html里，我人晕了
打开链接就能下，但是太j8多了，看得我眼花，就写了这个脚本
本来想一键式直接出中文的，但是实力有限，有空再做吧…语音识别和翻译的网页就是里面的几个url
因为不会模拟登陆，所以还是请把测试网页Ctrl+S保存成index.html，运行这个脚本就会出现很多听力原文
以后如果有时间必会更新
——20200403

CODE

from bs4 import BeautifulSoup
import re, requests

def main():
    url = "index.html"
    url_voice2text = "https://app.xunjiepdf.com/voice2text/"
    url_translate = "https://translate.google.cn/"
    soup = BeautifulSoup(open(url, encoding='utf-8'))
    allres = soup.find_all(name='div',attrs={"class":"itest-hear-reslist"})
    # print(soup)
    counter = 1
    for res in allres:
        st = res.span.text;
        st = str(st)[1:-1]
        st = st.split(',')
        for est in st:
            if(re.match(r'.*\.mp3.*', est)):
                if(re.match(r'.*question\.mp3+.*', est)):
                    continue
                print(est[1:-1])
                download_addr = est[1:-1]
                print('downloading...')
                f=requests.get(download_addr)
                with open(str(counter)+".mp3","wb") as code:
                    code.write(f.content)
                counter += 1
                print('downloaded')

        # print(i.span.text)
    # print(allres)
    

if __name__ == '__main__':
    main()

Have fun.

Post author: BeiYu
Post link: https://blog.bj-yan.top/misc-fuck-unipus/
Copyright Notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.

1. Version 1.1
2. Version 1.0
1. 2.1. README
2. 2.2. CODE

BeiYu

Sometimes it's the very people who no one imagines angthing of who do the things that no one can imagine.

GitHub E-Mail Weibo QQ ZhiHu