当前位置:   article > 正文

Clickhouse官方测试数据_clickhouse官方测试数据csv文件下载

clickhouse官方测试数据csv文件下载

我们先获取一些开源数据样本集,我们将使用美国1987到2015年的民用航班数据,很难称这个样本为大数据(只包含1亿6千6百万行数据,未压缩时有63GB),但我们能用它很快地开干。

创建表语句:

CREATE TABLE `ontime` ( \
  `Year` UInt16, \
  `Quarter` UInt8, \
  `Month` UInt8, \
  `DayofMonth` UInt8, \
  `DayOfWeek` UInt8, \
  `FlightDate` Date, \
  `UniqueCarrier` FixedString(7), \
  `AirlineID` Int32, \
  `Carrier` FixedString(2), \
  `TailNum` String, \
  `FlightNum` String, \
  `OriginAirportID` Int32, \
  `OriginAirportSeqID` Int32, \
  `OriginCityMarketID` Int32, \
  `Origin` FixedString(5), \
  `OriginCityName` String, \
  `OriginState` FixedString(2), \
  `OriginStateFips` String, \
  `OriginStateName` String, \
  `OriginWac` Int32, \
  `DestAirportID` Int32, \
  `DestAirportSeqID` Int32, \
  `DestCityMarketID` Int32, \
  `Dest` FixedString(5), \
  `DestCityName` String, \
  `DestState` FixedString(2), \
  `DestStateFips` String, \
  `DestStateName` String, \
  `DestWac` Int32, \
  `CRSDepTime` Int32, \
  `DepTime` Int32, \
  `DepDelay` Int32, \
  `DepDelayMinutes` Int32, \
  `DepDel15` Int32, \
  `DepartureDelayGroups` String, \
  `DepTimeBlk` String, \
  `TaxiOut` Int32, \
  `WheelsOff` Int32, \
  `WheelsOn` Int32, \
  `TaxiIn` Int32, \
  `CRSArrTime` Int32, \
  `ArrTime` Int32, \
  `ArrDelay` Int32, \
  `ArrDelayMinutes` Int32, \
  `ArrDel15` Int32, \
  `ArrivalDelayGroups` Int32, \
  `ArrTimeBlk` String, \
  `Cancelled` UInt8, \
  `CancellationCode` FixedString(1), \
  `Diverted` UInt8, \
  `CRSElapsedTime` Int32, \
  `ActualElapsedTime` Int32, \
  `AirTime` Int32, \
  `Flights` Int32, \
  `Distance` Int32, \
  `DistanceGroup` UInt8, \
  `CarrierDelay` Int32, \
  `WeatherDelay` Int32, \
  `NASDelay` Int32, \
  `SecurityDelay` Int32, \
  `LateAircraftDelay` Int32, \
  `FirstDepTime` String, \
  `TotalAddGTime` String, \
  `LongestAddGTime` String, \
  `DivAirportLandings` String, \
  `DivReachedDest` String, \
  `DivActualElapsedTime` String, \
  `DivArrDelay` String, \
  `DivDistance` String, \
  `Div1Airport` String, \
  `Div1AirportID` Int32, \
  `Div1AirportSeqID` Int32, \
  `Div1WheelsOn` String, \
  `Div1TotalGTime` String, \
  `Div1LongestGTime` String, \
  `Div1WheelsOff` String, \
  `Div1TailNum` String, \
  `Div2Airport` String, \
  `Div2AirportID` Int32, \
  `Div2AirportSeqID` Int32, \
  `Div2WheelsOn` String, \
  `Div2TotalGTime` String, \
  `Div2LongestGTime` String, \
  `Div2WheelsOff` String, \
  `Div2TailNum` String, \
  `Div3Airport` String, \
  `Div3AirportID` Int32, \
  `Div3AirportSeqID` Int32, \
  `Div3WheelsOn` String, \
  `Div3TotalGTime` String, \
  `Div3LongestGTime` String, \
  `Div3WheelsOff` String, \
  `Div3TailNum` String, \
  `Div4Airport` String, \
  `Div4AirportID` Int32, \
  `Div4AirportSeqID` Int32, \
  `Div4WheelsOn` String, \
  `Div4TotalGTime` String, \
  `Div4LongestGTime` String, \
  `Div4WheelsOff` String, \
  `Div4TailNum` String, \
  `Div5Airport` String, \
  `Div5AirportID` Int32, \
  `Div5AirportSeqID` Int32, \
  `Div5WheelsOn` String, \
  `Div5TotalGTime` String, \
  `Div5LongestGTime` String, \
  `Div5WheelsOff` String, \
  `Div5TailNum` String \
) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111

数据可以从这里下载https://yadi.sk/d/pOZxpa42sDdgm
导入数据(我这行命令导入失败,下面的方法成功了):xz -v -c -d < ontime.csv.xz | clickhouse-client –query=”INSERT INTO ontime FORMAT CSV”(等了半天一条数据也没插进去,后来我又在Windows10下面将ontime.csv.xz解压后六十多G的CSV文件再传到Ubuntu上面,再用clickhouse-client –query “INSERT INTO ontime FORMAT CSV” < ontime.csv命令导入,可是导入了一百多万条后就又挂掉了。我感觉是文件过大的问题,像下面是按每个月划分,最大的也才二十多M就OK)
 
你也可以从原地址下载,下载数据的shell脚本(该脚本下载的数据已经更新为1987年到2017年,由于文件数目、大小、网络的原因,我整整下载了两天,zip文件总大小为6.3G):https://github.com/Percona-Lab/ontime-airline-performance/blob/master/download.sh

for s in `seq 1987 2017`
do
for m in `seq 1 12`
do
wget http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_${s}_${m}.zip
done
done
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

导入数据:

for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client --host=localhost --query="INSERT INTO ontime FORMAT CSVWithNames"; done
  • 1

如果报下面这类错的话,在脚本的wget后加–no-check-certificate参数:

Connecting to transtats.bts.gov (transtats.bts.gov)|204.68.194.70|:443... connected.
ERROR: cannot verify transtats.bts.gov's certificate, issued by ‘/C=US/O=Entrust, Inc./OU=See www.entrust.net/legal-terms/OU=(c) 2012 Entrust, Inc. - for authorized use only/CN=Entrust Certification Authority - L1K’:
  Unable to locally verify the issuer's authority.
To connect to transtats.bts.gov insecurely, use `--no-check-certificate'.
  • 1
  • 2
  • 3
  • 4

 
总行数:

:) select count(*) from ontime;

SELECT count(*)
FROM ontime 

┌───count()─┐
│ 176668654 │
└───────────┘

1 rows in set. Elapsed: 4.365 sec. Processed 176.67 million rows, 176.67 MB (40.47 million rows/s., 40.47 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

 
2015年最受欢迎的目的地:

SELECT \
    OriginCityName, \
    DestCityName, \
    count(*) AS flights, \
    bar(flights, 0, 20000, 40) \
FROM ontime \
WHERE Year = 2015 \
GROUP BY \
    OriginCityName, \
    DestCityName \
ORDER BY flights DESC \
LIMIT 20;

┌─OriginCityName────┬─DestCityName──────┬─flights─┬─bar(count(), 0, 20000, 40)──────┐
│ San Francisco, CA │ Los Angeles, CA   │   15116 │ ██████████████████████████████▏ │
│ Los Angeles, CA   │ San Francisco, CA │   14799 │ █████████████████████████████▌  │
│ New York, NY      │ Chicago, IL       │   14734 │ █████████████████████████████▍  │
│ Chicago, IL       │ New York, NY      │   14632 │ █████████████████████████████▎  │
│ Boston, MA        │ New York, NY      │   13201 │ ██████████████████████████▍     │
│ New York, NY      │ Boston, MA        │   13201 │ ██████████████████████████▍     │
│ New York, NY      │ Los Angeles, CA   │   13113 │ ██████████████████████████▏     │
│ Los Angeles, CA   │ New York, NY      │   13106 │ ██████████████████████████▏     │
│ Chicago, IL       │ Washington, DC    │   12509 │ █████████████████████████       │
│ Washington, DC    │ Chicago, IL       │   12310 │ ████████████████████████▌       │
│ Atlanta, GA       │ Chicago, IL       │   12213 │ ████████████████████████▍       │
│ Chicago, IL       │ Atlanta, GA       │   12103 │ ████████████████████████▏       │
│ Los Angeles, CA   │ Chicago, IL       │   11111 │ ██████████████████████▏         │
│ Atlanta, GA       │ New York, NY      │   11004 │ ██████████████████████          │
│ New York, NY      │ Atlanta, GA       │   10986 │ █████████████████████▊          │
│ Miami, FL         │ New York, NY      │   10790 │ █████████████████████▌          │
│ New York, NY      │ Miami, FL         │   10779 │ █████████████████████▌          │
│ Chicago, IL       │ Los Angeles, CA   │   10755 │ █████████████████████▌          │
│ Las Vegas, NV     │ Los Angeles, CA   │   10657 │ █████████████████████▎          │
│ Boston, MA        │ Washington, DC    │   10655 │ █████████████████████▎          │
└───────────────────┴───────────────────┴─────────┴─────────────────────────────────┘

20 rows in set. Elapsed: 11.339 sec. Processed 7.18 million rows, 331.70 MB (633.25 thousand rows/s., 29.25 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

 
最受欢迎的出发城市:

SELECT \
    OriginCityName, \
    count(*) AS flights \
FROM ontime \
GROUP BY OriginCityName \
ORDER BY flights DESC \
LIMIT 20;

┌─OriginCityName────────┬──flights─┐
│ Chicago, IL           │ 11151277 │
│ Atlanta, GA           │  9560972 │
│ Dallas/Fort Worth, TX │  7921213 │
│ Houston, TX           │  6054671 │
│ Los Angeles, CA       │  5963597 │
│ New York, NY          │  5426917 │
│ Denver, CO            │  5351312 │
│ Phoenix, AZ           │  5006112 │
│ Washington, DC        │  4355229 │
│ San Francisco, CA     │  4141722 │
│ Detroit, MI           │  4109780 │
│ Las Vegas, NV         │  3923183 │
│ Minneapolis, MN       │  3830458 │
│ Newark, NJ            │  3717883 │
│ Charlotte, NC         │  3619757 │
│ Boston, MA            │  3292009 │
│ St. Louis, MO         │  3180881 │
│ Orlando, FL           │  3038619 │
│ Salt Lake City, UT    │  3020356 │
│ Seattle, WA           │  2969059 │
└───────────────────────┴──────────┘

20 rows in set. Elapsed: 21.299 sec. Processed 176.67 million rows, 3.92 GB (8.29 million rows/s., 183.82 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

 
目的地最多的出发城市:

SELECT \
    OriginCityName, \
    uniq(Dest) As u \
FROM ontime \
GROUP BY OriginCityName \
ORDER BY u DESC \
LIMIT 20;

┌─OriginCityName────────┬───u─┐
│ Chicago, IL           │ 213 │
│ Atlanta, GA           │ 210 │
│ Dallas/Fort Worth, TX │ 190 │
│ Denver, CO            │ 179 │
│ Minneapolis, MN       │ 158 │
│ Houston, TX           │ 152 │
│ Detroit, MI           │ 147 │
│ Salt Lake City, UT    │ 147 │
│ Cincinnati, OH        │ 145 │
│ New York, NY          │ 135 │
│ Los Angeles, CA       │ 128 │
│ Washington, DC        │ 127 │
│ Charlotte, NC         │ 124 │
│ Newark, NJ            │ 124 │
│ Orlando, FL           │ 121 │
│ Phoenix, AZ           │ 121 │
│ Las Vegas, NV         │ 117 │
│ Pittsburgh, PA        │ 114 │
│ Memphis, TN           │ 113 │
│ San Francisco, CA     │ 110 │
└───────────────────────┴─────┘

20 rows in set. Elapsed: 44.371 sec. Processed 176.67 million rows, 4.80 GB (3.98 million rows/s., 108.15 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

周内各天的航班延误:

SELECT \
    DayOfWeek, \
    count() AS c, \
    avg(DepDelay > 60) AS delays \
FROM ontime \
GROUP BY DayOfWeek \
ORDER BY DayOfWeek ASC;

┌─DayOfWeek─┬────────c─┬───────────────delays─┐
│         1 │ 26032980 │ 0.044869738308868215 │
│         2 │ 25752217 │  0.03884279167110156 │
│         3 │ 25883344 │  0.04181356937496175 │
│         4 │ 25985675 │  0.04855652200683646 │
│         5 │ 26026260 │  0.05150490312476706 │
│         6 │ 22380078 │ 0.035844781238027854 │
│         7 │ 24608100 │     0.04395995627456 │
└───────────┴──────────┴──────────────────────┘

7 rows in set. Elapsed: 26.459 sec. Processed 176.67 million rows, 883.34 MB (6.68 million rows/s., 33.38 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

 
最常延误1小时及以上的出发城市:

SELECT \
    OriginCityName, \
    count() AS c, \
    avg(DepDelay > 60) AS delays \
FROM ontime \
GROUP BY OriginCityName \
HAVING c > 100000 \
ORDER BY delays DESC \
LIMIT 20;

┌─OriginCityName──────┬────────c─┬───────────────delays─┐
│ Fayetteville, AR    │   185229 │  0.06730047670721108 │
│ Newark, NJ          │  3717883 │  0.06609971319699948 │
│ Chicago, IL         │ 11151277 │   0.0617198371092387 │
│ San Francisco, CA   │  4141722 │  0.06033722205401521 │
│ Eugene, OR          │   114522 │  0.05732523008679555 │
│ Santa Barbara, CA   │   199334 │   0.0560968023518316 │
│ New York, NY        │  5426917 │  0.05557851723179109 │
│ White Plains, NY    │   202042 │  0.05521624216747013 │
│ Springfield, MO     │   140140 │  0.05520907663764807 │
│ Burlington, VT      │   150360 │  0.05497472732109603 │
│ Miami, FL           │  2096946 │  0.05327461937503398 │
│ Philadelphia, PA    │  2864104 │ 0.052185604991997495 │
│ Monterey, CA        │   109122 │  0.05193269918073349 │
│ Fort Lauderdale, FL │  1643990 │  0.05149970498604006 │
│ Columbia, SC        │   226198 │  0.05121619112458996 │
│ Valparaiso, FL      │   109304 │  0.05119666251921247 │
│ Juneau, AK          │   127035 │  0.05069469043964262 │
│ Boston, MA          │  3292009 │ 0.050523859442668594 │
│ Moline, IL          │   121352 │ 0.049920891291449665 │
│ Akron, OH           │   148005 │ 0.049815884598493294 │
└─────────────────────┴──────────┴──────────────────────┘

20 rows in set. Elapsed: 30.723 sec. Processed 176.67 million rows, 4.62 GB (5.75 million rows/s., 150.44 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

 
最长飞行时间:

SELECT \
    OriginCityName, \
    DestCityName, \
    count(*) AS flights, \
    avg(AirTime) As duration \
FROM ontime \
GROUP BY \
    OriginCityName, \
    DestCityName \
ORDER BY duration DESC \
LIMIT 20;

┌─OriginCityName────────┬─DestCityName───┬─flights─┬───────────duration─┐
│ New York, NY          │ Honolulu, HI   │    2074 │  606.9170684667309 │
│ Newark, NJ            │ Honolulu, HI   │    7219 │  590.4765202936695 │
│ Washington, DC        │ Honolulu, HI   │    1119 │  579.7444146559428 │
│ Charlotte, NC         │ Honolulu, HI   │     223 │  563.4394618834081 │
│ Atlanta, GA           │ Kahului, HI    │     173 │  545.6184971098265 │
│ Cincinnati, OH        │ Honolulu, HI   │    1176 │  540.4447278911565 │
│ Detroit, MI           │ Honolulu, HI   │     467 │  535.8779443254818 │
│ Honolulu, HI          │ New York, NY   │    2077 │  525.0828117477131 │
│ Honolulu, HI          │ Newark, NJ     │    7233 │  518.7114613576663 │
│ Honolulu, HI          │ Washington, DC │    1119 │ 508.65415549597856 │
│ St. Louis, MO         │ Kahului, HI    │    1093 │ 498.97987191216833 │
│ Dallas/Fort Worth, TX │ Lihue, HI      │      17 │ 497.52941176470586 │
│ Honolulu, HI          │ Charlotte, NC  │     223 │  492.3273542600897 │
│ Honolulu, HI          │ Cincinnati, OH │    1177 │  484.4086661002549 │
│ Minneapolis, MN       │ Honolulu, HI   │    5430 │  477.9731123388582 │
│ Kahului, HI           │ Atlanta, GA    │     173 │  477.9364161849711 │
│ Honolulu, HI          │ Detroit, MI    │     467 │  469.3468950749465 │
│ Houston, TX           │ Kahului, HI    │     761 │   461.227332457293 │
│ Dallas/Fort Worth, TX │ Kona, HI       │     148 │  461.0945945945946 │
│ Dallas/Fort Worth, TX │ Kahului, HI    │    6913 │ 460.49631129755534 │
└───────────────────────┴────────────────┴─────────┴────────────────────┘

20 rows in set. Elapsed: 62.137 sec. Processed 176.67 million rows, 8.54 GB (2.84 million rows/s., 137.39 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36

 
按航空公司进行划分的到达时间延迟分布:

SELECT \
    Carrier, \
    count() AS c, \
    round(quantileTDigest(0.99)(DepDelay), 2) AS q \
FROM ontime \
GROUP BY Carrier \
ORDER BY q DESC;

┌─Carrier─┬────────c─┬──────q─┐
│ B6      │  2991782 │ 191.22 │
│ NK      │   412396 │ 190.94 │
│ EV      │  6222018 │ 187.23 │
│ XE      │  2145095 │ 179.55 │
│ VX      │   371390 │ 178.31 │
│ YV      │  1704176 │ 178.03 │
│ DH      │   693047 │ 165.18 │
│ F9      │  1120723 │ 162.58 │
│ FL      │  2485709 │ 156.03 │
│ 9E      │  1342097 │ 153.94 │
│ TZ      │   208420 │ 152.52 │
│ OO      │  8583371 │ 151.77 │
│ OH      │  1765828 │ 148.35 │
│ RU      │  1314294 │ 147.34 │
│ MQ      │  6877396 │  145.5 │
│ CO      │  8784850 │ 139.17 │
│ EA      │   880824 │ 131.67 │
│ AS      │  4270919 │ 124.17 │
│ NW      │ 10473832 │ 118.52 │
│ HP      │  3587974 │ 118.39 │
│ TW      │  3692615 │ 117.15 │
│ US      │ 16084998 │ 113.19 │
│ PI      │   833073 │ 105.93 │
│ ML      │    70622 │ 102.34 │
│ PA      │   302766 │  98.72 │
│ PS      │    83617 │  95.84 │
│ AL      │   455873 │  77.94 │
│ UA      │ 17357913 │  71.09 │
│ HA      │   935934 │  61.49 │
│ AQ      │   154381 │  60.14 │
│ AA      │ 20571665 │   9.02 │
│ DL      │ 23240979 │      4 │
│ WN      │ 26648077 │      4 │
└─────────┴──────────┴────────┘

33 rows in set. Elapsed: 160.932 sec. Processed 176.67 million rows, 1.06 GB (1.10 million rows/s., 6.59 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45

 
停止航班运营的航空公司:

SELECT \
    Carrier, \
    min(Year), \
    max(Year), \
    count() \
FROM ontime \
GROUP BY Carrier \
HAVING max(Year) < 2015 \
ORDER BY count() DESC;

┌─Carrier─┬─min(Year)─┬─max(Year)─┬──count()─┐
│ NW      │      1987 │      2009 │ 10473832 │
│ CO      │      1987 │      2011 │  8784850 │
│ TW      │      1987 │      2001 │  3692615 │
│ HP      │      1987 │      2005 │  3587974 │
│ FL      │      2003 │      2014 │  2485709 │
│ XE      │      2006 │      2011 │  2145095 │
│ OH      │      2004 │      2010 │  1765828 │
│ YV      │      2006 │      2013 │  1704176 │
│ 9E      │      2007 │      2013 │  1342097 │
│ RU      │      2003 │      2006 │  1314294 │
│ EA      │      1987 │      1990 │   880824 │
│ PI      │      1987 │      1989 │   833073 │
│ DH      │      2003 │      2005 │   693047 │
│ AL      │      1987 │      1988 │   455873 │
│ PA      │      1987 │      1991 │   302766 │
│ TZ      │      2003 │      2006 │   208420 │
│ AQ      │      2000 │      2008 │   154381 │
│ PS      │      1987 │      1988 │    83617 │
│ ML      │      1991 │      1991 │    70622 │
└─────────┴───────────┴───────────┴──────────┘

19 rows in set. Elapsed: 8.625 sec. Processed 176.67 million rows, 706.70 MB (20.48 million rows/s., 81.93 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

 
2015年最具趋向目的地城市:

SELECT \
    DestCityName, \
    sum(Year = 2014) AS c2014, \
    sum(Year = 2015) AS c2015, \
    c2015 / c2014 AS diff \
FROM ontime \
WHERE Year IN (2014, 2015) \
GROUP BY DestCityName \
HAVING (c2014 > 10000) AND (c2015 > 1000) AND (diff > 1) \
ORDER BY diff DESC;

┌─DestCityName───────────────────┬──c2014─┬──c2015─┬───────────────diff─┐
│ Dallas, TX                     │  48294 │  65633 │  1.359030107259701 │
│ Fort Lauderdale, FL            │  64109 │  79419 │ 1.2388120232728634 │
│ Minneapolis, MN                │ 106202 │ 122751 │ 1.1558256906649593 │
│ Boise, ID                      │  11228 │  12819 │ 1.1416993231207695 │
│ Detroit, MI                    │ 105984 │ 118311 │  1.116310009057971 │
│ Seattle, WA                    │ 108722 │ 121292 │ 1.1156159746877357 │
│ Kona, HI                       │  10992 │  12080 │  1.098981077147016 │
│ Fort Myers, FL                 │  26641 │  29127 │ 1.0933148155099284 │
│ Orlando, FL                    │ 110409 │ 120028 │ 1.0871215208905072 │
│ Memphis, TN                    │  15038 │  16287 │  1.083056257481048 │
│ West Palm Beach/Palm Beach, FL │  22523 │  24320 │ 1.0797851085556986 │
│ Oakland, CA                    │  43266 │  46325 │ 1.0707021679840985 │
│ Austin, TX                     │  43095 │  46092 │ 1.0695440306300035 │
│ Chicago, IL                    │ 375875 │ 402011 │ 1.0695337545726638 │
│ Boston, MA                     │ 110630 │ 118012 │ 1.0667269275964928 │
│ Las Vegas, NV                  │ 137058 │ 145900 │ 1.0645128339826935 │
│ Tampa, FL                      │  64879 │  69062 │ 1.0644738667365403 │
│ Cincinnati, OH                 │  20769 │  21944 │ 1.0565747026818817 │
│ New Orleans, LA                │  40490 │  42472 │ 1.0489503581131143 │
│ Santa Ana, CA                  │  39142 │  40733 │  1.040646875479025 │
│ Baltimore, MD                  │  90845 │  94105 │  1.035885299135891 │
│ Anchorage, AK                  │  16791 │  17233 │ 1.0263236257518908 │
│ Atlanta, GA                    │ 369842 │ 379498 │ 1.0261084463095052 │
│ San Juan, PR                   │  25900 │  26529 │ 1.0242857142857142 │
│ Lihue, HI                      │  11165 │  11427 │ 1.0234661889834304 │
│ Kahului, HI                    │  21953 │  22461 │ 1.0231403452831047 │
│ Grand Rapids, MI               │  11513 │  11767 │ 1.0220620168505168 │
│ Honolulu, HI                   │  46310 │  46937 │ 1.0135391923990498 │
│ New York, NY                   │ 207502 │ 210245 │ 1.0132191496949428 │
│ Newark, NJ                     │ 110221 │ 111486 │ 1.0114769417806044 │
│ Cleveland, OH                  │  37478 │  37801 │  1.008618389455147 │
│ Buffalo, NY                    │  18381 │  18416 │ 1.0019041401447146 │
│ Providence, RI                 │  12152 │  12157 │ 1.0004114549045424 │
└────────────────────────────────┴────────┴────────┴────────────────────┘

33 rows in set. Elapsed: 5.572 sec. Processed 12.95 million rows, 312.16 MB (2.32 million rows/s., 56.02 MB/s.) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48

 
最受欢迎的季节性旅游目的地城市:

SELECT \
    DestCityName, \
    any(total), \
    avg(abs((monthly * 12) - total) / total) AS avg_month_diff \
FROM \
( \
    SELECT \
        DestCityName, \
        count() AS total \
    FROM ontime \
    GROUP BY DestCityName \
    HAVING total > 100000 \
) ALL INNER JOIN \
( \
    SELECT \
        DestCityName, \
        Month, \
        count() AS monthly \
    FROM ontime \
    GROUP BY \
        DestCityName, \
        Month \
    HAVING monthly > 10000 \
) USING (DestCityName) \
GROUP BY DestCityName \
ORDER BY avg_month_diff DESC \
LIMIT 20;

┌─DestCityName───────────────────┬─any(total)─┬───────avg_month_diff─┐
│ Juneau, AK                     │     127029 │  0.26276362090546257 │
│ Bozeman, MT                    │     107007 │  0.23356415935406094 │
│ Palm Springs, CA               │     241336 │  0.23237312294891765 │
│ Fort Myers, FL                 │     642191 │  0.19487478543507045 │
│ Anchorage, AK                  │     550641 │   0.1817055032226078 │
│ Fairbanks, AK                  │     131135 │  0.13696318043746267 │
│ Valparaiso, FL                 │     109145 │  0.13496724540748545 │
│ Sarasota/Bradenton, FL         │     202931 │  0.11884252939833408 │
│ Myrtle Beach, SC               │     120790 │  0.11748607382351896 │
│ West Palm Beach/Palm Beach, FL │     741018 │   0.1156544105541296 │
│ Portland, ME                   │     214450 │  0.10123571928188389 │
│ Eugene, OR                     │     114268 │  0.09463716876115798 │
│ Seattle, WA                    │    2968380 │  0.07901751123508446 │
│ San Juan, PR                   │     638327 │  0.07704384534363527 │
│ Billings, MT                   │     121895 │   0.0706919890069322 │
│ Burlington, VT                 │     149777 │  0.06467615187912697 │
│ Fort Lauderdale, FL            │    1641563 │  0.06102527489553147 │
│ Lihue, HI                      │     181427 │  0.06057992103343676 │
│ Savannah, GA                   │     265786 │  0.05747606470368392 │
│ Kona, HI                       │     188276 │ 0.057143059480054104 │
└────────────────────────────────┴────────────┴──────────────────────┘

20 rows in set. Elapsed: 34.215 sec. Processed 353.34 million rows, 8.01 GB (10.33 million rows/s., 234.01 MB/s.)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52

 
注:你看到我上面有的执行结果在一分钟左右了都,觉的也不快啊。我这里是在Windows10(8G内存)下的VMware虚拟机下跑的,可能体现不出性能来。网上有的人在生产环境中配置高的服务器上跑的话,像我上面一分钟出结果的查询他们能够在一两秒左右
 
参考:
官方快速开始:https://clickhouse.yandex/#quick-start
官方使用指南:https://clickhouse.yandex/docs/en/single/index.html#create-table
俄语版:https://github.com/yandex/ClickHouse/blob/master/CHANGELOG_RU.md
英文版:https://github.com/yandex/ClickHouse/blob/master/CHANGELOG.md
战斗民族开源神器ClickHouse:一款适合于构建量化回测研究系统的高性能列式数据库(一):http://www.sohu.com/a/160303189_505915
战斗民族开源神器ClickHouse:一款适合于构建量化回测研究系统的高性能列式数据库(二)http://www.sohu.com/a/160527514_505915
新浪-高鹏-2017年11月:http://www.docin.com/p-2061139848.html?qq-pf-to=pcqq.temporaryc2c
彪悍开源的分析数据库-ClickHouse(知乎):https://zhuanlan.zhihu.com/p/22165241

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/246584
推荐阅读
相关标签
  

闽ICP备14008679号