- # 1. 使用read.table读取数据
- system.time(
- read.table("/home/data/test_data", sep = "\001",
- quote = "", stringsAsFactors = F, comment.char = "",
- col.names = colNames)
- )
- # colNames为预先定义的列名;
- # 也可以设置为 :col.names = TRUE / FALSE
-
- # user system elapsed
- # 67.943 0.277 68.326
-
-
- # 2.使用readr::read_delim读取数据
- library(readr)
- system.time(
- read_delim("/home/data/test_data",
- delim = "\001", quote = "", comment = "",
- col_names = colNames)
- )
- # colNames为预先定义的列名;
- # 也可以设置为 :col.names = TRUE / FALSE
-
- # =================================| 100% 796 MB
- # user system elapsed
- # 12.790 0.245 12.947
可以看出,读取796MB的数据test_data,read.table所用时间为67.943s,而read_delim只需要12.790s;读取速度有显著的提升,大约为read.table的5倍。