当前位置:   article > 正文

实用机器学习笔记(六):特征工程_feature_engineering.create_initial_file

feature_engineering.create_initial_file

1. Feature Engineering(特征工程)

  • Machine learning algorithms prefer well define fixed length input/output (机器学习更喜欢固定的输入输出)

  • Feature engineering(FE) is the key to ML method before deep learning(DL)

    • in a computer vision task ,people try various FE methods and then train a SVM model
  • DL train deep neural networks to extract features(深度学习可以自动提取特征,而很多机器学习方法需要FE提取特征

    • features are relevant to the task

2. Tabular data features(表格数据)

  • int/float : directly use or bin to n unique int values (数据转换)

  • categorical data:one-hot encoding (数据独热编码)

    • map rare categories into “unknown”
  • Data-time :a feature list such as (时间变换)

    • [year,month,day,day_of_year,week_of_year,…]
  • Feature combination: Cartesian product of two feature groups (数据组合)

    • [cat ,dog] * [male,female] -->
    • [(cat,male),(cat,female),(dog,male),(dog,female)]

3. Text features (文本数据)

  • Represent text as token features (将文本转换为token)

    • Bag of words(BoW) model

      • limitations: needs careful vocabulary design ,missing context
    • Word embeddings(e.g. Word2vec) (词嵌入)

      • vectorizing words such that similar words are placed close together
      • trained by predicting target word from context words
  • Pre-trained language models(e.g. BERT ,GPT-3) : (预训练深度神经网络抽取特征)

    • giant transformer models
    • traind with large amount of unannotated data
    • fine-tuning for downstream task

4. image/video features (图片/视频数据)

  • traditionally extract images by hand-craft features such as SIFT (手动提取)
  • now commonly use pre-trained deep neural networks (预训练神经网络)
    • ResNet:trained with ImageNet(Image classification)
    • I3D:trained with Kinetics(action classifition)

5. Summary

  • Features matter
  • Features are hand-crafted or learned by deep neural networks (要不手动,要不深度神经网络预训练)
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/371935
推荐阅读
相关标签
  

闽ICP备14008679号