自媒体文章-自动审核 1)自媒体文章自动审核流程
1 自媒体端发布文章后,开始审核文章
2 审核的主要是审核文章的内容(文本内容和图片)
3 借助第三方提供的接口审核文本
4 借助第三方提供的接口审核图片,由于图片存储到minIO中,需要先下载才能审核
5 如果审核失败,则需要修改自媒体文章的状态,status:2 审核失败 status:3 转到人工审核
6 如果审核成功,则需要在文章微服务中创建app端需要的文章
2)内容安全第三方接口 2.1)概述 内容安全是识别服务,支持对图片、视频、文本、语音等对象进行多样化场景检测,有效降低内容违规风险。
目前很多平台都支持内容检测,如阿里云、腾讯云、百度AI、网易云等国内大型互联网公司都对外提供了API。
按照性能和收费来看,黑马头条项目使用的就是阿里云的内容安全接口,使用到了图片和文本的审核。
阿里云收费标准:https://www.aliyun.com/price/product/?spm=a2c4g.11186623.2.10.4146401eg5oeu8#/lvwang/detail
2.2)准备工作 您在使用内容检测API之前,需要先注册阿里云账号,添加Access Key并签约云盾内容安全。
操作步骤
前往阿里云官网 注册账号。如果已有注册账号,请跳过此步骤。
进入阿里云首页后,如果没有阿里云的账户需要先进行注册,才可以进行登录。由于注册较为简单,课程和讲义不在进行体现(注册可以使用多种方式,如淘宝账号、支付宝账号、微博账号等…)。
需要实名认证和活体认证。
打开云盾内容安全产品试用页面 ,单击立即开通 ,正式开通服务。
内容安全控制台
在AccessKey管理页面 管理您的AccessKeyID和AccessKeySecret。
管理自己的AccessKey,可以新建和删除AccessKey
查看自己的AccessKey,
AccessKey默认是隐藏的,第一次申请的时候可以保存AccessKey,点击显示,通过验证手机号后也可以查看
2.3)文本内容审核接口 文本垃圾内容检测:https://help.aliyun.com/document_detail/70439.html?spm=a2c4g.11186623.6.659.35ac3db3l0wV5k
文本垃圾内容Java SDK: https://help.aliyun.com/document_detail/53427.html?spm=a2c4g.11186623.6.717.466d7544QbU8Lr
2.4)图片审核接口 图片垃圾内容检测:https://help.aliyun.com/document_detail/70292.html?spm=a2c4g.11186623.6.616.5d7d1e7f9vDRz4
图片垃圾内容Java SDK: https://help.aliyun.com/document_detail/53424.html?spm=a2c4g.11186623.6.715.c8f69b12ey35j4
2.5)项目集成 LTAI5tH4vxb7jSevNHgX53rv FURM3LeH6BuspIzfmFUuuotd7hBqTu
①:拷贝资料文件夹中的类到common模块下面,并添加到自动配置
包括了GreenImageScan和GreenTextScan及对应的工具类
添加到自动配置中
②: accessKeyId和secret(需自己申请)
在heima-leadnews-wemedia中的nacos配置中心添加以下配置:
1 2 3 4 5 aliyun: accessKeyId: LTAI5tCWHCcfvqQzu8k2oKmX secret: auoKUFsghimbfVQHpy7gtRyBkoR4vc scenes: terrorism
③:在自媒体微服务中测试类中注入审核文本和图片的bean进行测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 package com.heima.wemedia;import com.heima.common.aliyun.GreenImageScan;import com.heima.common.aliyun.GreenTextScan;import com.heima.file.service.FileStorageService;import org.junit.Test;import org.junit.runner.RunWith;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.boot.test.context.SpringBootTest;import org.springframework.test.context.junit4.SpringRunner;import java.util.Arrays;import java.util.Map;@SpringBootTest(classes = WemediaApplication.class) @RunWith(SpringRunner.class) public class AliyunTest { @Autowired private GreenTextScan greenTextScan; @Autowired private GreenImageScan greenImageScan; @Autowired private FileStorageService fileStorageService; @Test public void testScanText () throws Exception { Map map = greenTextScan.greeTextScan("我是一个好人,冰毒" ); System.out.println(map); } @Test public void testScanImage () throws Exception { byte [] bytes = fileStorageService.downLoadFile("http://192.168.200.130:9000/leadnews/2021/04/26/ef3cbe458db249f7bd6fb4339e593e55.jpg" ); Map map = greenImageScan.imageScan(Arrays.asList(bytes)); System.out.println(map); } }
3)app端文章保存接口 3.1)表结构说明 ap_article 文章信息表
ap_article_config 文章配置表
ap_article_content 文章内容表
3.2)分布式id 随着业务的增长,文章表可能要占用很大的物理存储空间,为了解决该问题,后期使用数据库分片技术。将一个数据库进行拆分,通过数据库中间件连接。如果数据库中该表选用ID自增策略,则可能产生重复的ID,此时应该使用分布式ID生成策略来生成ID。
snowflake是Twitter开源的分布式ID生成算法,结果是一个long型的ID。其核心思想是:使用41bit作为毫秒数,10bit作为机器的ID(5个bit是数据中心,5个bit的机器ID),12bit作为毫秒内的流水号(意味着每个节点在每毫秒可以产生 4096 个 ID),最后还有一个符号位,永远是0
文章端相关的表都使用雪花算法生成id,包括ap_article、 ap_article_config、 ap_article_content
mybatis-plus已经集成了雪花算法,完成以下两步即可在项目中集成雪花算法
第一:在实体类中的id上加入如下配置,指定类型为id_worker
1 2 @TableId(value = "id",type = IdType.ID_WORKER) private Long id;
第二:在application.yml文件中配置数据中心id和机器id
1 2 3 4 5 6 7 mybatis-plus: mapper-locations: classpath*:mapper/*.xml type-aliases-package: com.heima.model.article.pojos global-config: datacenter-id: 1 workerId: 1
datacenter-id:数据中心id(取值范围:0-31)
workerId:机器id(取值范围:0-31)
3.3)思路分析 在文章审核成功以后需要在app的article库中新增文章数据
1.保存文章信息 ap_article
2.保存文章配置信息 ap_article_config
3.保存文章内容 ap_article_content
实现思路:
3.4)feign接口
说明
接口路径
/api/v1/article/save
请求方式
POST
参数
ArticleDto
响应结果
ResponseResult
ArticleDto
1 2 3 4 5 6 7 8 9 10 11 12 package com.heima.model.article.dtos;import com.heima.model.article.pojos.ApArticle;import lombok.Data;@Data public class ArticleDto extends ApArticle { private String content; }
成功:
1 2 3 4 5 { "code" : 200 , "errorMessage" : "操作成功" , "data" :"1302864436297442242" }
失败:
1 2 3 4 { "code" :501 , "errorMessage" :"参数失效" , }
1 2 3 4 { "code" :501 , "errorMessage" :"文章没有找到" , }
功能实现:
①:在heima-leadnews-feign-api中新增接口
第一:线导入feign的依赖
1 2 3 4 <dependency > <groupId > org.springframework.cloud</groupId > <artifactId > spring-cloud-starter-openfeign</artifactId > </dependency >
第二:定义文章端的接口
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 package com.heima.apis.article; import com.heima.model.article.dtos.ArticleDto; import com.heima.model.common.dtos.ResponseResult; import org.springframework.cloud.openfeign.FeignClient; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestBody; import java.io.IOException; @FeignClient(value = "leadnews-article" ) public interface IArticleClient { @PostMapping("/api/v1/article/save" ) public ResponseResult saveArticle(@RequestBody ArticleDto dto) ; }
②:在heima-leadnews-article中实现该方法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 package com.heima.article.feign;import com.heima.apis.article.IArticleClient;import com.heima.article.service.ApArticleService;import com.heima.model.article.dtos.ArticleDto;import com.heima.model.common.dtos.ResponseResult;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.web.bind.annotation.*;import java.io.IOException;@RestController public class ArticleClient implements IArticleClient { @Autowired private ApArticleService apArticleService; @Override @PostMapping("/api/v1/article/save") public ResponseResult saveArticle (@RequestBody ArticleDto dto) { return apArticleService.saveArticle(dto); } }
③:拷贝mapper
在资料文件夹中拷贝ApArticleConfigMapper类到mapper文件夹中
同时,修改ApArticleConfig类,添加如下构造函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 package com.heima.model.article.pojos;import com.baomidou.mybatisplus.annotation.IdType;import com.baomidou.mybatisplus.annotation.TableField;import com.baomidou.mybatisplus.annotation.TableId;import com.baomidou.mybatisplus.annotation.TableName;import lombok.Data;import lombok.NoArgsConstructor;import java.io.Serializable;@Data @NoArgsConstructor @TableName("ap_article_config") public class ApArticleConfig implements Serializable { public ApArticleConfig (Long articleId) { this .articleId = articleId; this .isComment = true ; this .isForward = true ; this .isDelete = false ; this .isDown = false ; } @TableId(value = "id",type = IdType.ID_WORKER) private Long id; @TableField("article_id") private Long articleId; @TableField("is_comment") private Boolean isComment; @TableField("is_forward") private Boolean isForward; @TableField("is_down") private Boolean isDown; @TableField("is_delete") private Boolean isDelete; }
④:在ApArticleService中新增方法
1 2 3 4 5 6 ResponseResult saveArticle (ArticleDto dto) ;
实现类:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 @Autowired private ApArticleConfigMapper apArticleConfigMapper;@Autowir ed private ApArticleContentMapper apArticleContentMapper;@Override public ResponseResult saveArticle (ArticleDto dto) { if (dto == null ){ return ResponseResult.errorResult(AppHttpCodeEnum.PARAM_INVALID); } ApArticle apArticle = new ApArticle(); BeanUtils.copyProperties(dto,apArticle); if (dto.getId() == null ){ save(apArticle); ApArticleConfig apArticleConfig = new ApArticleConfig(apArticle.getId()); apArticleConfigMapper.insert(apArticleConfig); ApArticleContent apArticleContent = new ApArticleContent(); apArticleContent.setArticleId(apArticle.getId()); apArticleContent.setContent(dto.getContent()); apArticleContentMapper.insert(apArticleContent); }else { updateById(apArticle); ApArticleContent apArticleContent = apArticleContentMapper.selectOne(Wrappers.<ApArticleContent>lambdaQuery().eq(ApArticleContent::getArticleId, dto.getId())); apArticleContent.setContent(dto.getContent()); apArticleContentMapper.updateById(apArticleContent); } return ResponseResult.okResult(apArticle.getId()); }
⑤:测试
编写junit单元测试,或使用postman进行测试
1 2 3 4 5 6 7 8 9 10 { "id" :1390209114747047938 , "title" :"黑马头条项目背景22222222222222" , "authoId" :1102 , "layout" :1 , "labels" :"黑马头条" , "publishTime" :"2028-03-14T11:35:49.000Z" , "images" : "http://192.168.200.130:9000/leadnews/2021/04/26/5ddbdb5c68094ce393b08a47860da275.jpg" , "content" :"22222222222222222黑马头条项目背景,黑马头条项目背景,黑马头条项目背景,黑马头条项目背景,黑马头条项目背景" }
4)自媒体文章自动审核功能实现 4.1)表结构说明 wm_news 自媒体文章表
status字段:0 草稿 1 待审核 2 审核失败 3 人工审核 4 人工审核通过 8 审核通过(待发布) 9 已发布
4.2)实现
在heima-leadnews-wemedia中的service新增接口
1 2 3 4 5 6 7 8 9 10 package com.heima.wemedia.service;public interface WmNewsAutoScanService { public void autoScanWmNews (Integer id) ; }
实现类:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 package com.heima.wemedia.service.impl;import com.alibaba.fastjson.JSONArray;import com.heima.apis.article.IArticleClient;import com.heima.common.aliyun.GreenImageScan;import com.heima.common.aliyun.GreenTextScan;import com.heima.file.service.FileStorageService;import com.heima.model.article.dtos.ArticleDto;import com.heima.model.common.dtos.ResponseResult;import com.heima.model.wemedia.pojos.WmChannel;import com.heima.model.wemedia.pojos.WmNews;import com.heima.model.wemedia.pojos.WmUser;import com.heima.wemedia.mapper.WmChannelMapper;import com.heima.wemedia.mapper.WmNewsMapper;import com.heima.wemedia.mapper.WmUserMapper;import com.heima.wemedia.service.WmNewsAutoScanService;import lombok.extern.slf4j.Slf4j;import org.apache.commons.lang3.StringUtils;import org.springframework.beans.BeanUtils;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Service;import org.springframework.transaction.annotation.Transactional;import java.util.*;import java.util.stream.Collectors;@Service @Slf4j @Transactional public class WmNewsAutoScanServiceImpl implements WmNewsAutoScanService { @Autowired private WmNewsMapper wmNewsMapper; @Override public void autoScanWmNews (Integer id) { WmNews wmNews = wmNewsMapper.selectById(id); if (wmNews == null ){ throw new RuntimeException("WmNewsAutoScanServiceImpl-文章不存在" ); } if (wmNews.getStatus().equals(WmNews.Status.SUBMIT.getCode())){ Map<String,Object> textAndImages = handleTextAndImages(wmNews); boolean isTextScan = handleTextScan((String) textAndImages.get("content" ),wmNews); if (!isTextScan)return ; boolean isImageScan = handleImageScan((List<String>) textAndImages.get("images" ),wmNews); if (!isImageScan)return ; ResponseResult responseResult = saveAppArticle(wmNews); if (!responseResult.getCode().equals(200 )){ throw new RuntimeException("WmNewsAutoScanServiceImpl-文章审核,保存app端相关文章数据失败" ); } wmNews.setArticleId((Long) responseResult.getData()); updateWmNews(wmNews,(short ) 9 ,"审核成功" ); } } @Autowired private IArticleClient articleClient; @Autowired private WmChannelMapper wmChannelMapper; @Autowired private WmUserMapper wmUserMapper; private ResponseResult saveAppArticle (WmNews wmNews) { ArticleDto dto = new ArticleDto(); BeanUtils.copyProperties(wmNews,dto); dto.setLayout(wmNews.getType()); WmChannel wmChannel = wmChannelMapper.selectById(wmNews.getChannelId()); if (wmChannel != null ){ dto.setChannelName(wmChannel.getName()); } dto.setAuthorId(wmNews.getUserId().longValue()); WmUser wmUser = wmUserMapper.selectById(wmNews.getUserId()); if (wmUser != null ){ dto.setAuthorName(wmUser.getName()); } if (wmNews.getArticleId() != null ){ dto.setId(wmNews.getArticleId()); } dto.setCreatedTime(new Date()); ResponseResult responseResult = articleClient.saveArticle(dto); return responseResult; } @Autowired private FileStorageService fileStorageService; @Autowired private GreenImageScan greenImageScan; private boolean handleImageScan (List<String> images, WmNews wmNews) { boolean flag = true ; if (images == null || images.size() == 0 ){ return flag; } images = images.stream().distinct().collect(Collectors.toList()); List<byte []> imageList = new ArrayList<>(); for (String image : images) { byte [] bytes = fileStorageService.downLoadFile(image); imageList.add(bytes); } try { Map map = greenImageScan.imageScan(imageList); if (map != null ){ if (map.get("suggestion" ).equals("block" )){ flag = false ; updateWmNews(wmNews, (short ) 2 , "当前文章中存在违规内容" ); } if (map.get("suggestion" ).equals("review" )){ flag = false ; updateWmNews(wmNews, (short ) 3 , "当前文章中存在不确定内容" ); } } } catch (Exception e) { flag = false ; e.printStackTrace(); } return flag; } @Autowired private GreenTextScan greenTextScan; private boolean handleTextScan (String content, WmNews wmNews) { boolean flag = true ; if ((wmNews.getTitle()+"-" +content).length() == 0 ){ return flag; } try { Map map = greenTextScan.greeTextScan((wmNews.getTitle()+"-" +content)); if (map != null ){ if (map.get("suggestion" ).equals("block" )){ flag = false ; updateWmNews(wmNews, (short ) 2 , "当前文章中存在违规内容" ); } if (map.get("suggestion" ).equals("review" )){ flag = false ; updateWmNews(wmNews, (short ) 3 , "当前文章中存在不确定内容" ); } } } catch (Exception e) { flag = false ; e.printStackTrace(); } return flag; } private void updateWmNews (WmNews wmNews, short status, String reason) { wmNews.setStatus(status); wmNews.setReason(reason); wmNewsMapper.updateById(wmNews); } private Map<String, Object> handleTextAndImages (WmNews wmNews) { StringBuilder stringBuilder = new StringBuilder(); List<String> images = new ArrayList<>(); if (StringUtils.isNotBlank(wmNews.getContent())){ List<Map> maps = JSONArray.parseArray(wmNews.getContent(), Map.class); for (Map map : maps) { if (map.get("type" ).equals("text" )){ stringBuilder.append(map.get("value" )); } if (map.get("type" ).equals("image" )){ images.add((String) map.get("value" )); } } } if (StringUtils.isNotBlank(wmNews.getImages())){ String[] split = wmNews.getImages().split("," ); images.addAll(Arrays.asList(split)); } Map<String, Object> resultMap = new HashMap<>(); resultMap.put("content" ,stringBuilder.toString()); resultMap.put("images" ,images); return resultMap; } }
4.3)单元测试 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 package com.heima.wemedia.service;import com.heima.wemedia.WemediaApplication;import org.junit.Test;import org.junit.runner.RunWith;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.boot.test.context.SpringBootTest;import org.springframework.test.context.junit4.SpringRunner;import static org.junit.Assert.*;@SpringBootTest(classes = WemediaApplication.class) @RunWith(SpringRunner.class) public class WmNewsAutoScanServiceTest { @Autowired private WmNewsAutoScanService wmNewsAutoScanService; @Test public void autoScanWmNews () { wmNewsAutoScanService.autoScanWmNews(6238 ); } }
4.4)feign远程接口调用方式
在heima-leadnews-wemedia服务中已经依赖了heima-leadnews-feign-apis工程,只需要在自媒体的引导类中开启feign的远程调用即可
注解为:@EnableFeignClients(basePackages = "com.heima.apis")
需要指向apis这个包
4.5)服务降级处理
实现步骤:
①:在heima-leadnews-feign-api编写降级逻辑
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 package com.heima.apis.article.fallback;import com.heima.apis.article.IArticleClient;import com.heima.model.article.dtos.ArticleDto;import com.heima.model.common.dtos.ResponseResult;import com.heima.model.common.enums.AppHttpCodeEnum;import org.springframework.stereotype.Component;@Component public class IArticleClientFallback implements IArticleClient { @Override public ResponseResult saveArticle (ArticleDto dto) { return ResponseResult.errorResult(AppHttpCodeEnum.SERVER_ERROR,"获取数据失败" ); } }
在自媒体微服务中添加类,扫描降级代码类的包
1 2 3 4 5 6 7 8 9 package com.heima.wemedia.config;import org.springframework.context.annotation.ComponentScan;import org.springframework.context.annotation.Configuration;@Configuration @ComponentScan("com.heima.apis.article.fallback") public class InitConfig {}
②:远程接口中指向降级代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 package com.heima.apis.article;import com.heima.apis.article.fallback.IArticleClientFallback;import com.heima.model.article.dtos.ArticleDto;import com.heima.model.common.dtos.ResponseResult;import org.springframework.cloud.openfeign.FeignClient;import org.springframework.web.bind.annotation.PostMapping;import org.springframework.web.bind.annotation.RequestBody;@FeignClient(value = "leadnews-article",fallback = IArticleClientFallback.class) public interface IArticleClient { @PostMapping("/api/v1/article/save") public ResponseResult saveArticle (@RequestBody ArticleDto dto) ; }
③:客户端开启降级heima-leadnews-wemedia
在wemedia的nacos配置中心里添加如下内容,开启服务降级,也可以指定服务响应的超时的时间
1 2 3 4 5 6 7 8 9 10 feign: hystrix: enabled: true client: config: default: connectTimeout: 2000 readTimeout: 2000
④:测试
在ApArticleServiceImpl类中saveArticle方法添加代码
1 2 3 4 5 try { Thread.sleep(3000 ); } catch (InterruptedException e) { e.printStackTrace(); }
在自媒体端进行审核测试,会出现服务降级的现象
5)发布文章提交审核集成 5.1)同步调用与异步调用 同步:就是在发出一个调用时,在没有得到结果之前, 该调用就不返回(实时处理)
异步:调用在发出之后,这个调用就直接返回了,没有返回结果(分时处理)
异步线程的方式审核文章
5.2)Springboot集成异步线程调用 ①:在自动审核的方法上加上@Async注解(标明要异步调用)
1 2 3 4 5 @Override @Async public void autoScanWmNews (Integer id) { }
②:在文章发布成功后调用审核的方法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 @Autowired private WmNewsAutoScanService wmNewsAutoScanService;@Override public ResponseResult submitNews (WmNewsDto dto) { wmNewsAutoScanService.autoScanWmNews(wmNews.getId()); return ResponseResult.okResult(AppHttpCodeEnum.SUCCESS); }
③:在自媒体引导类中使用@EnableAsync注解开启异步调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 @SpringBootApplication @EnableDiscoveryClient @MapperScan("com.heima.wemedia.mapper") @EnableFeignClients(basePackages = "com.heima.apis") @EnableAsync public class WemediaApplication { public static void main (String[] args) { SpringApplication.run(WemediaApplication.class,args); } @Bean public MybatisPlusInterceptor mybatisPlusInterceptor () { MybatisPlusInterceptor interceptor = new MybatisPlusInterceptor(); interceptor.addInnerInterceptor(new PaginationInnerInterceptor(DbType.MYSQL)); return interceptor; } }
6)文章审核功能-综合测试 6.1)服务启动列表 1,nacos服务端
2,article微服务
3,wemedia微服务
4,启动wemedia网关微服务
5,启动前端系统wemedia
6.2)测试情况列表 1,自媒体前端发布一篇正常的文章
审核成功后,app端的article相关数据是否可以正常保存,自媒体文章状态和app端文章id是否回显
2,自媒体前端发布一篇包含敏感词的文章
正常是审核失败, wm_news表中的状态是否改变,成功和失败原因正常保存
3,自媒体前端发布一篇包含敏感图片的文章
正常是审核失败, wm_news表中的状态是否改变,成功和失败原因正常保存
7)新需求-自管理敏感词 7.1)需求分析 文章审核功能已经交付了,文章也能正常发布审核。突然,产品经理过来说要开会。
会议的内容核心有以下内容:
需要完成的功能:
需要自己维护一套敏感词,在文章审核的时候,需要验证文章是否包含这些敏感词
7.2)敏感词-过滤 技术选型
方案
说明
数据库模糊查询
效率太低
String.indexOf(“”)查找
数据库量大的话也是比较慢
全文检索
分词再匹配
DFA算法
确定有穷自动机(一种数据结构)
7.3)DFA实现原理 DFA全称为:Deterministic Finite Automaton,即确定有穷自动机。
存储:一次性的把所有的敏感词存储到了多个map中,就是下图表示这种结构
敏感词:冰毒、大麻、大坏蛋
检索的过程
7.4)自管理敏感词集成到文章审核中 ①:创建敏感词表,导入资料中wm_sensitive到leadnews_wemedia库中
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 package com.heima.model.wemedia.pojos;import com.baomidou.mybatisplus.annotation.IdType;import com.baomidou.mybatisplus.annotation.TableField;import com.baomidou.mybatisplus.annotation.TableId;import com.baomidou.mybatisplus.annotation.TableName;import lombok.Data;import java.io.Serializable;import java.util.Date;@Data @TableName("wm_sensitive") public class WmSensitive implements Serializable { private static final long serialVersionUID = 1L ; @TableId(value = "id", type = IdType.AUTO) private Integer id; @TableField("sensitives") private String sensitives; @TableField("created_time") private Date createdTime; }
②:拷贝对应的wm_sensitive的mapper到项目中
1 2 3 4 5 6 7 8 9 10 package com.heima.wemedia.mapper;import com.baomidou.mybatisplus.core.mapper.BaseMapper;import com.heima.model.wemedia.pojos.WmSensitive;import org.apache.ibatis.annotations.Mapper;@Mapper public interface WmSensitiveMapper extends BaseMapper <WmSensitive > {}
③:在文章审核的代码中添加自管理敏感词审核
第一:在WmNewsAutoScanServiceImpl中的autoScanWmNews方法上添加如下代码
1 2 3 4 5 6 7 8 9 boolean isSensitive = handleSensitiveScan((String) textAndImages.get("content" ), wmNews);if (!isSensitive) return ;
新增自管理敏感词审核代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 @Autowired private WmSensitiveMapper wmSensitiveMapper;private boolean handleSensitiveScan (String content, WmNews wmNews) { boolean flag = true ; List<WmSensitive> wmSensitives = wmSensitiveMapper.selectList(Wrappers.<WmSensitive>lambdaQuery().select(WmSensitive::getSensitives)); List<String> sensitiveList = wmSensitives.stream().map(WmSensitive::getSensitives).collect(Collectors.toList()); SensitiveWordUtil.initMap(sensitiveList); Map<String, Integer> map = SensitiveWordUtil.matchWords(content); if (map.size() >0 ){ updateWmNews(wmNews,(short ) 2 ,"当前文章中存在违规内容" +map); flag = false ; } return flag; }
8)新需求-图片识别文字审核敏感词 8.1)需求分析 产品经理召集开会,文章审核功能已经交付了,文章也能正常发布审核。对于上次提出的自管理敏感词也很满意,这次会议核心的内容如下:
文章中包含的图片要识别文字,过滤掉图片文字的敏感词
8.2)图片文字识别 什么是OCR?
OCR (Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程
方案
说明
百度OCR
收费
Tesseract-OCR
Google维护的开源OCR引擎,支持Java,Python等语言调用
Tess4J
封装了Tesseract-OCR ,支持Java调用
8.3)Tess4j案例 ①:创建项目导入tess4j对应的依赖
1 2 3 4 5 <dependency > <groupId > net.sourceforge.tess4j</groupId > <artifactId > tess4j</artifactId > <version > 4.1.1</version > </dependency >
②:导入中文字体库, 把资料中的tessdata文件夹拷贝到自己的工作空间下
③:编写测试类进行测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 package com.heima.tess4j;import net.sourceforge.tess4j.ITesseract;import net.sourceforge.tess4j.Tesseract;import java.io.File;public class Application { public static void main (String[] args) { try { File file = new File("D:\\26.png" ); ITesseract tesseract = new Tesseract(); tesseract.setDatapath("D:\\workspace\\tessdata" ); tesseract.setLanguage("chi_sim" ); String result = tesseract.doOCR(file); result = result.replaceAll("\\r|\\n" ,"-" ).replaceAll(" " ,"" ); System.out.println("识别的结果为:" +result); } catch (Exception e) { e.printStackTrace(); } } }
8.4)管理敏感词和图片文字识别集成到文章审核 ①:在heima-leadnews-common中创建工具类,简单封装一下tess4j
需要先导入pom
1 2 3 4 5 <dependency > <groupId > net.sourceforge.tess4j</groupId > <artifactId > tess4j</artifactId > <version > 4.1.1</version > </dependency >
工具类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 package com.heima.common.tess4j;import lombok.Getter;import lombok.Setter;import net.sourceforge.tess4j.ITesseract;import net.sourceforge.tess4j.Tesseract;import net.sourceforge.tess4j.TesseractException;import org.springframework.boot.context.properties.ConfigurationProperties;import org.springframework.stereotype.Component;import java.awt.image.BufferedImage;@Getter @Setter @Component @ConfigurationProperties(prefix = "tess4j") public class Tess4jClient { private String dataPath; private String language; public String doOCR (BufferedImage image) throws TesseractException { ITesseract tesseract = new Tesseract(); tesseract.setDatapath(dataPath); tesseract.setLanguage(language); String result = tesseract.doOCR(image); result = result.replaceAll("\\r|\\n" , "-" ).replaceAll(" " , "" ); return result; } }
在spring.factories配置中添加该类,完整如下:
1 2 3 4 5 6 7 org.springframework.boot.autoconfigure.EnableAutoConfiguration=\ com.heima.common.exception.ExceptionCatch,\ com.heima.common.swagger.SwaggerConfiguration,\ com.heima.common.swagger.Swagger2Configuration,\ com.heima.common.aliyun.GreenTextScan,\ com.heima.common.aliyun.GreenImageScan,\ com.heima.common.tess4j.Tess4jClient
②:在heima-leadnews-wemedia中的配置中添加两个属性
1 2 3 tess4j: data-path: D:\workspace\tessdata language: chi_sim
③:在WmNewsAutoScanServiceImpl中的handleImageScan方法上添加如下代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 try { for (String image : images) { byte [] bytes = fileStorageService.downLoadFile(image); ByteArrayInputStream in = new ByteArrayInputStream(bytes); BufferedImage imageFile = ImageIO.read(in); String result = tess4jClient.doOCR(imageFile); boolean isSensitive = handleSensitiveScan(result, wmNews); if (!isSensitive){ return isSensitive; } imageList.add(bytes); } }catch (Exception e){ e.printStackTrace(); }
最后附上文章审核的完整代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 package com.heima.wemedia.service.impl;import com.alibaba.fastjson.JSONArray;import com.baomidou.mybatisplus.core.toolkit.Wrappers;import com.heima.apis.article.IArticleClient;import com.heima.common.aliyun.GreenImageScan;import com.heima.common.aliyun.GreenTextScan;import com.heima.common.tess4j.Tess4jClient;import com.heima.file.service.FileStorageService;import com.heima.model.article.dtos.ArticleDto;import com.heima.model.common.dtos.ResponseResult;import com.heima.model.wemedia.pojos.WmChannel;import com.heima.model.wemedia.pojos.WmNews;import com.heima.model.wemedia.pojos.WmSensitive;import com.heima.model.wemedia.pojos.WmUser;import com.heima.utils.common.SensitiveWordUtil;import com.heima.wemedia.mapper.WmChannelMapper;import com.heima.wemedia.mapper.WmNewsMapper;import com.heima.wemedia.mapper.WmSensitiveMapper;import com.heima.wemedia.mapper.WmUserMapper;import com.heima.wemedia.service.WmNewsAutoScanService;import lombok.extern.slf4j.Slf4j;import org.apache.commons.lang3.StringUtils;import org.springframework.beans.BeanUtils;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.scheduling.annotation.Async;import org.springframework.stereotype.Service;import org.springframework.transaction.annotation.Transactional;import javax.imageio.ImageIO;import java.awt.image.BufferedImage;import java.io.ByteArrayInputStream;import java.util.*;import java.util.stream.Collectors;@Service @Slf4j @Transactional public class WmNewsAutoScanServiceImpl implements WmNewsAutoScanService { @Autowired private WmNewsMapper wmNewsMapper; @Override @Async public void autoScanWmNews (Integer id) { WmNews wmNews = wmNewsMapper.selectById(id); if (wmNews == null ) { throw new RuntimeException("WmNewsAutoScanServiceImpl-文章不存在" ); } if (wmNews.getStatus().equals(WmNews.Status.SUBMIT.getCode())) { Map<String, Object> textAndImages = handleTextAndImages(wmNews); boolean isSensitive = handleSensitiveScan((String) textAndImages.get("content" ), wmNews); if (!isSensitive) return ; boolean isTextScan = handleTextScan((String) textAndImages.get("content" ), wmNews); if (!isTextScan) return ; boolean isImageScan = handleImageScan((List<String>) textAndImages.get("images" ), wmNews); if (!isImageScan) return ; ResponseResult responseResult = saveAppArticle(wmNews); if (!responseResult.getCode().equals(200 )) { throw new RuntimeException("WmNewsAutoScanServiceImpl-文章审核,保存app端相关文章数据失败" ); } wmNews.setArticleId((Long) responseResult.getData()); updateWmNews(wmNews, (short ) 9 , "审核成功" ); } } @Autowired private WmSensitiveMapper wmSensitiveMapper; private boolean handleSensitiveScan (String content, WmNews wmNews) { boolean flag = true ; List<WmSensitive> wmSensitives = wmSensitiveMapper.selectList(Wrappers.<WmSensitive>lambdaQuery().select(WmSensitive::getSensitives)); List<String> sensitiveList = wmSensitives.stream().map(WmSensitive::getSensitives).collect(Collectors.toList()); SensitiveWordUtil.initMap(sensitiveList); Map<String, Integer> map = SensitiveWordUtil.matchWords(content); if (map.size() >0 ){ updateWmNews(wmNews,(short ) 2 ,"当前文章中存在违规内容" +map); flag = false ; } return flag; } @Autowired private IArticleClient articleClient; @Autowired private WmChannelMapper wmChannelMapper; @Autowired private WmUserMapper wmUserMapper; private ResponseResult saveAppArticle (WmNews wmNews) { ArticleDto dto = new ArticleDto(); BeanUtils.copyProperties(wmNews, dto); dto.setLayout(wmNews.getType()); WmChannel wmChannel = wmChannelMapper.selectById(wmNews.getChannelId()); if (wmChannel != null ) { dto.setChannelName(wmChannel.getName()); } dto.setAuthorId(wmNews.getUserId().longValue()); WmUser wmUser = wmUserMapper.selectById(wmNews.getUserId()); if (wmUser != null ) { dto.setAuthorName(wmUser.getName()); } if (wmNews.getArticleId() != null ) { dto.setId(wmNews.getArticleId()); } dto.setCreatedTime(new Date()); ResponseResult responseResult = articleClient.saveArticle(dto); return responseResult; } @Autowired private FileStorageService fileStorageService; @Autowired private GreenImageScan greenImageScan; @Autowired private Tess4jClient tess4jClient; private boolean handleImageScan (List<String> images, WmNews wmNews) { boolean flag = true ; if (images == null || images.size() == 0 ) { return flag; } images = images.stream().distinct().collect(Collectors.toList()); List<byte []> imageList = new ArrayList<>(); try { for (String image : images) { byte [] bytes = fileStorageService.downLoadFile(image); ByteArrayInputStream in = new ByteArrayInputStream(bytes); BufferedImage imageFile = ImageIO.read(in); String result = tess4jClient.doOCR(imageFile); boolean isSensitive = handleSensitiveScan(result, wmNews); if (!isSensitive){ return isSensitive; } imageList.add(bytes); } }catch (Exception e){ e.printStackTrace(); } try { Map map = greenImageScan.imageScan(imageList); if (map != null ) { if (map.get("suggestion" ).equals("block" )) { flag = false ; updateWmNews(wmNews, (short ) 2 , "当前文章中存在违规内容" ); } if (map.get("suggestion" ).equals("review" )) { flag = false ; updateWmNews(wmNews, (short ) 3 , "当前文章中存在不确定内容" ); } } } catch (Exception e) { flag = false ; e.printStackTrace(); } return flag; } @Autowired private GreenTextScan greenTextScan; private boolean handleTextScan (String content, WmNews wmNews) { boolean flag = true ; if ((wmNews.getTitle() + "-" + content).length() == 0 ) { return flag; } try { Map map = greenTextScan.greeTextScan((wmNews.getTitle() + "-" + content)); if (map != null ) { if (map.get("suggestion" ).equals("block" )) { flag = false ; updateWmNews(wmNews, (short ) 2 , "当前文章中存在违规内容" ); } if (map.get("suggestion" ).equals("review" )) { flag = false ; updateWmNews(wmNews, (short ) 3 , "当前文章中存在不确定内容" ); } } } catch (Exception e) { flag = false ; e.printStackTrace(); } return flag; } private void updateWmNews (WmNews wmNews, short status, String reason) { wmNews.setStatus(status); wmNews.setReason(reason); wmNewsMapper.updateById(wmNews); } private Map<String, Object> handleTextAndImages (WmNews wmNews) { StringBuilder stringBuilder = new StringBuilder(); List<String> images = new ArrayList<>(); if (StringUtils.isNotBlank(wmNews.getContent())) { List<Map> maps = JSONArray.parseArray(wmNews.getContent(), Map.class); for (Map map : maps) { if (map.get("type" ).equals("text" )) { stringBuilder.append(map.get("value" )); } if (map.get("type" ).equals("image" )) { images.add((String) map.get("value" )); } } } if (StringUtils.isNotBlank(wmNews.getImages())) { String[] split = wmNews.getImages().split("," ); images.addAll(Arrays.asList(split)); } Map<String, Object> resultMap = new HashMap<>(); resultMap.put("content" , stringBuilder.toString()); resultMap.put("images" , images); return resultMap; } }
9)文章详情-静态文件生成 9.1)思路分析 文章端创建app相关文章时,生成文章详情静态页上传到MinIO中
9.2)实现步骤 1.新建ArticleFreemarkerService创建静态文件并上传到minIO中
1 2 3 4 5 6 7 8 9 10 11 12 13 package com.heima.article.service;import com.heima.model.article.pojos.ApArticle;public interface ArticleFreemarkerService { public void buildArticleToMinIO (ApArticle apArticle,String content) ; }
实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 package com.heima.article.service.impl;import com.alibaba.fastjson.JSON;import com.alibaba.fastjson.JSONArray;import com.baomidou.mybatisplus.core.toolkit.Wrappers;import com.heima.article.mapper.ApArticleContentMapper;import com.heima.article.service.ApArticleService;import com.heima.article.service.ArticleFreemarkerService;import com.heima.file.service.FileStorageService;import com.heima.model.article.pojos.ApArticle;import freemarker.template.Configuration;import freemarker.template.Template;import lombok.extern.slf4j.Slf4j;import org.apache.commons.lang3.StringUtils;import org.springframework.beans.BeanUtils;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.scheduling.annotation.Async;import org.springframework.stereotype.Service;import org.springframework.transaction.annotation.Transactional;import java.io.ByteArrayInputStream;import java.io.InputStream;import java.io.StringWriter;import java.util.HashMap;import java.util.Map;@Service @Slf4j @Transactional public class ArticleFreemarkerServiceImpl implements ArticleFreemarkerService { @Autowired private ApArticleContentMapper apArticleContentMapper; @Autowired private Configuration configuration; @Autowired private FileStorageService fileStorageService; @Autowired private ApArticleService apArticleService; @Async @Override public void buildArticleToMinIO (ApArticle apArticle, String content) { if (StringUtils.isNotBlank(content)){ Template template = null ; StringWriter out = new StringWriter(); try { template = configuration.getTemplate("article.ftl" ); Map<String,Object> contentDataModel = new HashMap<>(); contentDataModel.put("content" , JSONArray.parseArray(content)); template.process(contentDataModel,out); } catch (Exception e) { e.printStackTrace(); } InputStream in = new ByteArrayInputStream(out.toString().getBytes()); String path = fileStorageService.uploadHtmlFile("" , apArticle.getId() + ".html" , in); apArticleService.update(Wrappers.<ApArticle>lambdaUpdate().eq(ApArticle::getId,apArticle.getId()) .set(ApArticle::getStaticUrl,path)); } } }
2.在ApArticleService的saveArticle实现方法中添加调用生成文件的方法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 @Override public ResponseResult saveArticle (ArticleDto dto) { if (dto == null ){ return ResponseResult.errorResult(AppHttpCodeEnum.PARAM_INVALID); } ApArticle apArticle = new ApArticle(); BeanUtils.copyProperties(dto,apArticle); if (dto.getId() == null ){ save(apArticle); ApArticleConfig apArticleConfig = new ApArticleConfig(apArticle.getId()); apArticleConfigMapper.insert(apArticleConfig); ApArticleContent apArticleContent = new ApArticleContent(); apArticleContent.setArticleId(apArticle.getId()); apArticleContent.setContent(dto.getContent()); apArticleContentMapper.insert(apArticleContent); }else { updateById(apArticle); ApArticleContent apArticleContent = apArticleContentMapper.selectOne(Wrappers.<ApArticleContent>lambdaQuery().eq(ApArticleContent::getArticleId, dto.getId())); apArticleContent.setContent(dto.getContent()); apArticleContentMapper.updateById(apArticleContent); } articleFreemarkerService.buildArticleToMinIO(apArticle,dto.getContent()); return ResponseResult.okResult(apArticle.getId()); }
3.文章微服务开启异步调用