Qwen-VL 技术报告总结

作者：笔触狂放9 | 2024-03-31 04:09:49

踩

qwen-vl

感谢如此优秀的开源工作,仓库链接 Qwen-VL

在第一阶段中主要使用224X224分辨率训练，训练数据主要来源是公开数据集，经过清洗，数据总量大约是1.4B（中英混合）。训练目标是视觉语言和文本语言对齐。loss函数是交叉熵；训练过程：给定一个输入（例如图像or文本），预测整个词表中作为next token的概率（The language model, given an input (such as an image and some initial text), predicts the probability of each token in the vocabulary being the next token in the sequence.），实际标签转换为one-hot, 然后使用交叉熵损失函数计算两个的差（The actual distribution is represented by the true next token in the training data. In practice, this is often converted into a one-hot encoded vector, where the actual next token has a probability of 1, and all

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/笔触狂放9/article/detail/343059