当前位置:   article > 正文

mysql 重复数据的处理 查询 删除 select delete【通用】【全网最详细】_mysql 删除重复数据

mysql 删除重复数据

省流:

xxx是表名。name是有重复值的字段(即列名)。 id是自增,id越大,说明创建时间越近。

  1. delete from t1 using xxx as t1, xxx as t2
  2. where t1.name = t2.name and t1.id < t2.id;

正文:

案例如下

一、建表 

  1. CREATE TABLE `emp` (
  2. `id` int(11) NOT NULL AUTO_INCREMENT,
  3. `name` varchar(100) DEFAULT NULL,
  4. `update_time` datetime DEFAULT NULL,
  5. PRIMARY KEY (`id`)
  6. ) AUTO_INCREMENT=0

二、造数据 

  1. INSERT INTO emp
  2. (name, update_time)
  3. VALUES
  4. ('张三', '2023-09-06 00:00:00'),
  5. ('张三', '2023-09-06 00:00:00'),
  6. ('张三', '2023-09-06 00:00:00'),
  7. ('李四', '2023-09-06 00:00:00'),
  8. ('李四', '2023-09-06 00:00:00'),
  9. ('王五', '2023-09-06 00:00:00');

结果: 

  1. id|name|update_time |
  2. --+----+-------------------+
  3. 1|张三 |2023-09-06 00:00:00|
  4. 2|张三 |2023-09-06 00:00:00|
  5. 3|张三 |2023-09-06 00:00:00|
  6. 4|李四 |2023-09-06 00:00:00|
  7. 5|李四 |2023-09-06 00:00:00|
  8. 6|王五 |2023-09-06 00:00:00|

三、查询(将重复的数据查出来) 

1.查出重复的名字

  1. -- 查出重复的数据 重复的名字是哪个
  2. select name from emp group by name having count(1)>1;

 结果:

  1. name|
  2. ----+
  3. 张三 |
  4. 李四 |

 2.通过上面查出的名字作为条件,查出这些数据

  1. -- 查出重复的数据 哪些人名是重复的
  2. select * from emp where name in
  3. (select name from emp group by name having count(1)>1);

结果: 

  1. id|name|update_time |
  2. --+----+-------------------+
  3. 1|张三 |2023-09-06 00:00:00|
  4. 2|张三 |2023-09-06 00:00:00|
  5. 3|张三 |2023-09-06 00:00:00|
  6. 4|李四 |2023-09-06 00:00:00|
  7. 5|李四 |2023-09-06 00:00:00|

 如果我想留下最新的一条,通常根据update_time。update_time离得越近,就留下那条。但有时候批量生成的数据,时间都一样,那就只能根据id(id如果是自增,或者雪花都好办)来判断。

四、删除重复(留下时间最近的一条)

  1. -- 删除重复的人名,留下最近更新的那条(id最大即最近更新)
  2. delete from t1 using emp as t1, emp as t2
  3. where t1.name = t2.name and t1.id < t2.id;
  4. -- 如果时间不一样,想通过时间来判断
  5. delete from t1 using emp as t1, emp as t2
  6. where t1.name = t2.name and t1.update_time < t2.update_time;

1.分析sql

先来看看他查出来的是什么样子的数据

  1. select * from emp as t1, emp as t2
  2. where t1.name = t2.name and t1.id < t2.id;

结果:

  1. id|name |update_time |id|name |update_time |
  2. --+-----+--------------------+--+-----+-------------------+
  3. 1|张三 |2023-09-06 00:00:00| 2|张三 |2023-09-06 00:00:00|
  4. 1|张三 |2023-09-06 00:00:00| 3|张三 |2023-09-06 00:00:00|
  5. 2|张三 |2023-09-06 00:00:00| 3|张三 |2023-09-06 00:00:00|
  6. 4|李四 |2023-09-06 00:00:00| 5|李四 |2023-09-06 00:00:00|

 查出来的是id为1和2,正好需要删除的是1和2,留下3。

所以可以大胆的使用这种写法:

  1. delete from 别名1 using 表名 as 别名1, 表名 as 别名2;
  2. delete from t1 using emp as t1, emp as t2
  3. where t1.name = t2.name and t1.id < t2.id;

2.对比其他写法 

  1. delete from emp where id in(
  2. select id from (
  3. select t1.id from emp t1,
  4. (select max(id) as id,name from emp group by name having count(1)>1) t2
  5. where t1.name = t2.name and t1.id != t2.id
  6. )t
  7. )

 这个写法也很简单,他其实是这样的查询:将重复数据中id最大的那条作为一张表和主表关联,关联之后,查出名字相同但id不同的数据,这些数据就是要删除的。

  1. select * from emp t1,
  2. (select max(id) as id,name from emp group by name having count(1)>1) t2
  3. where t1.name = t2.name and t1.id != t2.id

查出结果: 

  1. id|name |update_time |id|name|
  2. --+------+-------------------+--+----+
  3. 1|张三 |2023-09-06 00:00:00| 3|张三 |
  4. 2|张三 |2023-09-06 00:00:00| 3|张三 |
  5. 4|李四 |2023-09-06 00:00:00| 5|李四 |

 然后再删除这几条数据

  1. delete from emp where id in(
  2. select t1.id from emp t1,
  3. (select max(id) as id,name from emp group by name having count(1)>1) t2
  4. where t1.name = t2.name and t1.id != t2.id
  5. )

但这样写会报错:

SQL 错误 [1093] [HY000]: You can't specify target table 'emp' for update in FROM clause 

这意思是:需要更新的那张表和条件用到的表不能是同一张表。即子查询和修改的表不能是同一个。解决的方法也很简单:就是在子查询外面再套一层,让儿子变成孙子。

mysql delete 删除重复数据_delete mysql 重复数据_globalcoding的博客-CSDN博客

====================分割线======================= 

文章到此已经结束,以下是紫薯布丁

delete from t1 using xxx as t1, xxx as t2 
where t1.name = t2.name and t1.id < t2.id;

CREATE TABLE `emp` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4

INSERT INTO emp
(name, update_time)
VALUES('张三', '2023-09-06 00:00:00'),
('张三', '2023-09-06 00:00:00'),
('张三', '2023-09-06 00:00:00'),
('李四', '2023-09-06 00:00:00')

-- 查出重复的数据 哪些人名是重复的
select name from emp group by name having count(1)>1

-- 删除重复的人名,留下最近更新的那条(id最大即最近更新)
delete from t1 using emp as t1, emp as t2 
where t1.name = t2.name and t1.id < t2.id;
;id|name|update_time        |
--+----+-------------------+
 1|张三  |2023-09-06 00:00:00|
 2|张三  |2023-09-06 00:00:00|
 3|张三  |2023-09-06 00:00:00|
 4|李四  |2023-09-06 00:00:00|

name|
----+
张三  |

id|name|update_time        |
--+----+-------------------+
 1|张三  |2023-09-06 00:00:00|
 2|张三  |2023-09-06 00:00:00|
 3|张三  |2023-09-06 00:00:00|

select * from emp where name in 
(select name from emp group by name having count(1)>1)

-- 如果时间不一样,想通过时间来判断
delete from t1 using emp as t1, emp as t2 
where t1.name = t2.name and t1.update_time < t2.update_time;

select * from  emp as t1, emp as t2 
where t1.name = t2.name and t1.id < t2.id;

id|name  |update_time        |id|name |update_time        |
--+----+-------------------+--+----+-------------------+
 1|张三  |2023-09-06 00:00:00| 2|张三  |2023-09-06 00:00:00|
 1|张三  |2023-09-06 00:00:00| 3|张三  |2023-09-06 00:00:00|
 2|张三  |2023-09-06 00:00:00| 3|张三  |2023-09-06 00:00:00|

delete from emp where id in(
    select id from (
        select t1.id from emp t1,
        (select max(id) as id,name from emp group by name having count(1)>1) t2
        where t1.name = t2.name and t1.id != t2.id
    )t
)

SQL 错误 [1093] [HY000]: You can't specify target table 'emp' for update in FROM clause

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号