- 微信
- 复制链接
  
  复制链接到剪贴板

如何用华为云modelarts平台玩转llama2-4008云顶国际网站

码上开花_lancer 发表于 2023/09/08 11:59:44 2023/09/08

【摘要】 metaai开源了llama2模型，我只想说一句：“「metaai改名叫openai吧！」”llama2不仅开源了预训练模型，而且还开源了利用对话数据sft后的llama2-chat模型，并对llama2-chat模型的微调进行了详细的介绍。开源模型目前有7b、13b、70b三种尺寸，预训练阶段使用了2万亿token，sft阶段使用了超过10w数据，人类偏好数据超过100w。![image....

metaai开源了llama2模型，我只想说一句：“「metaai改名叫openai吧！」”

llama2不仅开源了预训练模型，而且还开源了利用对话数据sft后的llama2-chat模型，并对llama2-chat模型的微调进行了详细的介绍。

开源模型目前有7b、13b、70b三种尺寸，预训练阶段使用了2万亿token，sft阶段使用了超过10w数据，人类偏好数据超过100w。

发布不到一周的llama 2，已经在研究社区爆火，一系列性能评测、在线试用的demo纷纷出炉。

就连openai联合创始人karpathy用c语言实现了对llama 2婴儿模型的推理。

既然llama 2现已人人可用，那么如何在华为云上去微调实现更多可能的应用呢？

打开华为云的modelarts 创建notebook,首先需要下载数据集上传到obs对象存储空间中，再通过命令copy到本地。

数据集地址：

打开obs上传下载的数据集：

import moxing as mox
mox.file.copy_parallel('obs://xxx/llma2_samsum_data','data')

1. 下载模型

克隆meta的llama推理存储库（包含下载脚本）：

!git clone https://github.com/facebookresearch/llama.git

cloning into 'llama'...
remote: enumerating objects: 353, done.
remote: counting objects: 100% (7/7), done.
remote: compressing objects: 100% (7/7), done.
remote: total 353 (delta 1), reused 3 (delta 0), pack-reused 346
receiving objects: 100% (353/353), 1.07 mib | 2.09 mib/s, done.
resolving deltas: 100% (186/186), done.

然后运行下载脚本：

!bash download.sh

在这里，你只需要下载7b模型就可以了。

2. 将模型转换为hugging face支持的格式

!pip install git https://github.com/huggingface/transformerscd transformerspython convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights --model_size 7b --output_dir models_hf/7b

现在，我们得到了一个hugging face模型，可以利用hugging face库进行微调了！

3. 运行微调笔记本：

克隆llama-recipies存储库：

!git clone https://github.com/facebookresearch/llama-recipes.git

然后，在你喜欢的notebook界面中打开quickstart.ipynb文件，并运行整个notebook。

（此处，使用的是jupyter lab）：

!pip install jupyterlabjupyter lab # in the repo you want to work in

为了适应转换后的实际模型路径，确保将以下一行更改为：

model_id="./models_hf/7b"

最后，一个经过lora微调的模型就完成了。

4. 在微调的模型上进行推理

当前，问题在于hugging face只保存了适配器权重，而不是完整的模型。所以我们需要将适配器权重加载到完整的模型中。

导入库：

import torchfrom transformers
import llamaforcausallm, llamatokenizerfrom peft import peftmodel, peftconfig

加载分词器和模型：

model_id="./models_hf/7b"tokenizer = llamatokenizer.from_pretrained(model_id)model =llamaforcausallm.from_pretrained(model_id, load_in_8bit=true, device_map='auto', torch_dtype=torch.float16)

从训练后保存的位置加载适配器：

model = peftmodel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint")

运行推理：

eval_prompt = """summarize this dialog:a: hi tom, are you busy tomorrow’s afternoon?b: i’m pretty sure i am. what’s up?a: can you go with me to the animal shelter?.b: what do you want to do?a: i want to get a puppy for my son.b: that will make him so happy.a: yeah, we’ve discussed it many times. i think he’s ready now.b: that’s good. raising a dog is a tough issue. like having a baby ;-)a: i'll get him one of those little dogs.b: one that won't grow up too big;-)a: and eat too much;-))b: do you know which one he would like?a: oh, yes, i took him there last monday. he showed me one that he really liked.b: i bet you had to drag him away.a: he wanted to take it home right away ;-).b: i wonder what he'll name it.a: he said he’d name it after his dead hamster – lemmy - he's a great motorhead fan :-)))---summary:"""
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()with torch.no_grad(): print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=true))

llm engine微调更便捷

如果你想用自己的数据对llama 2微调，该如何做？

创办scale ai初创公司的华人ceo alexandr wang表示，自家公司开源的llm engine，能够用最简单方法微调llama 2。

scale ai的团队在一篇博文中，具体介绍了llama 2的微调方法。

from llmengine import finetuneresponse = finetune.create( model="llama-2-7b", training_file="s3://my-bucket/path/to/training-file.csv",)
print(response.json())

数据集

在如下示例中，scale使用了science qa数据集。

这是一个由多项选择题组成的流行数据集，每个问题可能有文本上下文和图像上下文，并包含支持4008云顶国际网站的解决方案的详尽解释和讲解。

science qa的示例

目前，llm engine支持对「提示完成对」进行微调。首先，需要将science qa数据集转换为支持的格式，一个包含两列的csv：prompt和response 。

在开始之前，请安装所需的依赖项。

!pip install datasets==2.13.1 smart_open[s3]==5.2.1 pandas==1.4.4

可以从hugging face加载数据集，并观察数据集的特征。

from datasets import load_datasetfrom smart_open import smart_openimport pandas as pd
dataset = load_dataset('derek-thomas/scienceqa')dataset['train'].features

提供science qa示例的常用格式是：

context: a baby wants to know what is inside of a cabinet. her hand applies a force to the door, and the door opens.question: which type of force from the baby's hand opens the cabinet door?options: (a) pull (b) pushanswer: a.

由于hugging face数据集中options的格式是「可能答案的列表」，需要通过添加枚举前缀，将此列表转换为上面的示例格式。

choice_prefixes = [chr(ord('a')   i) for i in range(26)] # a-zdef format_options(options, choice_prefixes): return ' '.join([f'({c}) {o}' for c, o in zip(choice_prefixes, options)])

现在，编写格式化函数，将这个数据集中的单个样本转换为输入模型的prompt和response 。

def format_prompt(r, choice_prefixes): 
    options = format_options(r['choices'], choice_prefixes) 
    return f'''context: {r["hint"]}\nquestion: {r["question"]}\noptions:{options}\nanswer:'''
def format_response(r, choice_prefixes):
    return choice_prefixes[r['answer']]

最后，构建数据集。

请注意，science qa中的某些示例只有上下文图像。（如下演示中会跳过这些示例，因为llama-2纯粹是一种语言模型，并且不能接受图像输入。）

def convert_dataset(ds): 
    prompts = [format_prompt(i, choice_prefixes) for i in ds if i['hint'] != '']
    labels = [format_response(i, choice_prefixes) for i in ds if i['hint'] != ''] 
    df = pd.dataframe.from_dict({'prompt': prompts, 'response': labels}) 
     return df

llm engine支持使用「预训练和验证数据集」来进行训练。假如你只提供训练集，llm engine会从数据集中随机拆分10%内容进行验证。

因为拆分数据集可以防止模型过度拟合训练数据，不会导致在推理期间实时数据泛化效果不佳。

另外，这些数据集文件必须存储在可公开访问的url中，以便llm engine可以读取。对于此示例，scale将数据集保存到s3。

并且，还在github gist中公开了预处理训练数据集和验证数据集。你可以直接用这些链接替换train_url和val_url 。

train_url = 's3://...'val_url = 's3://...'df_train = convert_dataset(dataset['train'])with smart_open(train_url, 'wb') as f: df_train.to_csv(f)df_val = convert_dataset(dataset['validation'])with smart_open(val_url, 'wb') as f:df_val.to_csv(f)

现在，可以通过llm engine api开始微调。

微调
首先，需要安装llm engine。

!pip install scale-llm-engine

接下来，你需要设置scale api密钥。按照readme的说明获你唯一的api密钥。

高级用户还可以按照自托管llm engine指南进行操作，由此就不需要scale api密钥。

import os
os.environ['scale_api_key'] = 'xxx'

一旦你设置好一切，微调模型只需要一个api的调用。

在此，scale选择了llama-2的70亿参数版本，因为它对大多数用例来说已经足够强大了。

from llmengine import finetuneresponse = finetune.create( model="llama-2-7b", training_file=train_url, validation_file=val_url, hyperparameters={ 'lr':2e-4, }, suffix='science-qa-llama')run_id = response.fine_tune_id

通过run_id ，你可以监控工作状态，并获取每个epoch的实时更新指标，比如训练和验证损失。

science qa是一个大型数据集，因此训练可能需要一两个小时才能完成。

while true: job_status = finetune.get(run_id).status # returns one of `pending`, `started`, `success`, `running`, # `failure`, `cancelled`, `undefined` or `timeout` print(job_status) if job_status == 'success': break time.sleep(60)#logs for completed or running jobs can be fetched withlogs = finetune.get_events(run_id)

推理与评估

完成微调后，你可以开始对任何输入生成响应。但是，在此之前，确保模型存在，并准备好接受输入。

ft_model = finetune.get(run_id).fine_tuned_model

不过，你的第一个推理结果可能需要几分钟才能输出。之后，推理过程就会加快。

一起评估下在science qa上微调的llama-2模型的性能。

import pandas as pd
#helper a function to get outputs for fine-tuned model with retriesdef 
get_output(prompt: str, num_retry: int = 5): 
for _ in range(num_retry):
    try: response = completion.create( model=ft_model, prompt=prompt, max_new_tokens=1, temperature=0.01 ) 
            return response.output.text.strip()
     except exception as e: print(e) 
           return ""
#read the test datatest = pd.read_csv(val_url)
test["prediction"] = test["prompt"].apply(get_output)
print(f"accuracy: {(test['response'] == test['prediction']).mean() * 100:.2f}%")

微调后的llama-2能够达到82.15%的准确率，已经相当不错了。

那么，这个结果与llama-2基础模型相比如何？

由于预训练模型没有在这些数据集上进行微调，因此需要在提示中提供一个示例，以便模型学会遵从我们期望的回复格式。

另外，我们还可以看到与微调类似大小的模型mpt-7b相比的情况。

在science qa上微调llama-2，其性能增益有26.59%的绝对差异！

此外，由于提示长度较短，使用微调模型进行推理比使用少样本提示更便宜。这种微调llama-27b模型也优于1750亿参数模型gpt-3.5。

可以看到，llama-2模型在微调和少样本提示设置中表现都优于mpt，充分展示了它作为基础模型和可微调模型的优势。

此外，scale还使用llm engine微调和评估llama-2在glue（一组常用的nlp基准数据集）的几个任务上的性能。

现在，任何人都可以释放微调模型的真正潜力，并见证强大的ai生成回复的魔力。

我发现虽然huggingface在transformers方面构建了一个出色的库，但他们的指南对于普通用户来说往往过于复杂。

参考资料：
https://brev.dev/blog/fine-tuning-llama-2
https://scale.com/blog/fine-tune-llama-2

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000 优质创作者共同成长

立即加入

请填写举报理由

垃圾广告违规内容恶意灌水侮辱谩骂内容侵权其它

请输入举报理由，不超过200字

请填写举报理由

0/200

如何用华为云modelarts平台玩转llama2-4008云顶国际网站

1. 下载模型

2. 将模型转换为hugging face支持的格式

3. 运行微调笔记本：

4. 在微调的模型上进行推理

数据集

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

如何用华为云modelarts平台玩转llama2-4008云顶国际网站

1. 下载模型

2. 将模型转换为hugging face支持的格式

3. 运行微调笔记本：

4. 在微调的模型上进行推理

数据集

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品