Flink清洗Nginx日志
背景和需求
在广告DSP/DMP系统中,需要收集到Nginx Log来排查问题并将日志作为数据分析的数据源.
整体架构
1 |
|
组件版本信息:
1 | nginx 1.24 |
nginx 输出日志
默认的nginx log 模块只能记录请求信息:
1 | http { |
日志:
1 | 129.80.59.27 - app_data09 [2025-06-03T03:52:23+00:00] file-upload.data-oci.qiliangjia.com 22 da52888533afb04512bf8c55044a4816 "GET /data_v2/data_v1/ip_geo_info/1.1.1.1 HTTP/1.1" /data_v2/data_v1/ip_geo_info/1.1.1.1 306 200 [application/json] 343 "" "curl/8.5.0" "{\"name\":\"hello_world\"}" "" |
所以需要nginx lua modle(libnginx-mod-http-lua
)
1 | sudo apt install libnginx-mod-http-lua -y |
主要使用body_filter_by_lua_block
打印
https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#body_filter_by_lua
1 |
|
log 内容:
1 |
|
filebeat 同步数据到MQ
Flink 清洗数据
参考
lua-nginx-module https://github.com/openresty/lua-nginx-module
openresty https://github.com/openresty/openresty https://openresty.org/en/download.html
openresty install https://openresty.org/cn/installation.html
nginx 记录完整的 request 及 response https://www.hujingnb.com/archives/934 https://github.com/hujingnb/docker_composer/blob/master/openresty/nginx/config/test.conf
access log to Clickhouse https://clickhouse.com/blog/nginx-logs-to-clickhouse-fluent-bit
- Title: Flink清洗Nginx日志
- Author: Ordiy
- Created at : 2025-01-01 00:00:00
- Updated at : 2025-07-10 02:49:06
- Link: https://ordiy.github.io/posts/2025-nginx-log-flink-etl/
- License: This work is licensed under CC BY 4.0.