发现我的wordpress程序,并没有生产wp-sitemap.xml文件,网上也在说wp默认生成的wp-sitemap.xml文件,百度并不识别,需要程序再处理一下。于是我依网上教程,写了一个PHP程序,放到linux的crontab计划里,每天执行一次,程序脚本如下

<?php
require('./wp-blog-header.php');
header("Content-type: text/xml");
header('HTTP/1.1 200 OK');
$posts_to_show = 9999;
echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.baidu.com/schemas/sitemap-mobile/1/">'
?>
<url>
<loc><?php echo get_home_url(); ?></loc>
<lastmod><?php $ltime = get_lastpostmodified('GMT');$ltime = gmdate('Y-m-d\TH:i:s+00:00', strtotime($ltime)); echo $ltime; ?></lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<?php
/* 输出普通文章 POST */
$myposts = get_posts("numberposts=" . $posts_to_show );
foreach( $myposts as $post ) { ?>
<url>
<loc><?php the_permalink(); ?></loc>
<lastmod><?php the_time('c') ?></lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<?php } /* 普通文章循环结束 */ ?>
<?php
/* 输出页面 */
$mypages = get_pages();
if(count($mypages) > 0) {
foreach($mypages as $page) { ?>
<url>
<loc><?php echo get_page_link($page->ID); ?></loc>
<lastmod><?php echo str_replace(" ","T",get_page($page->ID)->post_modified); ?>+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.6</priority>
</url>
<?php }} /* 页面循环结束 */ ?>
<?php
/* 输出普通文章分类 */
$terms = get_terms('category', 'orderby=name&hide_empty=0' );
$count = count($terms);
if($count > 0){
foreach ($terms as $term) { ?>
<url>
<loc><?php echo get_term_link($term, $term->slug); ?></loc>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<?php }} /* 普通文章分类循环结束 */?>
<?php
/* 输出普通文章标签(可选) */
$tags = get_terms("post_tag");
foreach ( $tags as $key => $tag ) {
$link = get_term_link( intval($tag->term_id), "post_tag" );
if ( is_wp_error( $link ) )
return false;
$tags[ $key ]->link = $link;
?>
<url>
<loc><?php echo $link ?></loc>
<changefreq>monthly</changefreq>
<priority>0.4</priority>
</url>
<?php } /* 普通文章标签循环结束 */ ?>
</urlset>

把程序放到计划任务里

0 1 * * * wget -O /data/wwwroot/lgh/sitemap.xml --no-check-certificate https://www.liuguohua.com/lgh-sitemap.php && ln -s /data/wwwroot/lgh/sitemap.xml /data/wwwroot/lgh/wp-sitemap.xml >/dev/null 2>&1

监控几天后,确认程序可以正常跑,似乎百度也会过来抓取wp-sitemap.xml文件

但如果后期文件内容一多,每次都产生全量的信息,肯定不好,于是对生产的wp-sitemap.xml文件,再加工一下,只需要最近30天的数据。用python3写了一个程序,内容如下:

import sys
from datetime import datetime, timedelta
from dateutil import parser

def extract_recent_url_blocks(xml_file, days=30):
# 计算截止日期(当前时间 - 30 天)
cutoff = datetime.now() - timedelta(days=days)

with open(xml_file, 'r', encoding='utf-8') as f:
content = f.readlines()

# 提取前6行(通常是XML声明和<urlset>)
header = ''.join(content[:6])

# 合并剩余内容并按<url>分割
body = ''.join(content[6:])
url_blocks = body.split('<url>')[1:] # 忽略第一个空块

recent_blocks = []
for block in url_blocks:
if '</url>' not in block:
continue

# 提取<lastmod>时间
lastmod_start = block.find('<lastmod>') + len('<lastmod>')
lastmod_end = block.find('</lastmod>')
lastmod_str = block[lastmod_start:lastmod_end].strip()

try:
lastmod = parser.parse(lastmod_str)
if lastmod.replace(tzinfo=None) >= cutoff.replace(tzinfo=None):
recent_blocks.append(f"<url>{block}")
except Exception as e:
print(f"解析时间失败: {lastmod_str},错误: {e}", file=sys.stderr)

# 拼接结果:前7行 + 筛选的<url> + </urlset>
result = header + ''.join(recent_blocks) + "</urlset>"
return result

if __name__ == "__main__":
if len(sys.argv) != 2:
print("用法: python3 filter_sitemap.py sitemap.xml", file=sys.stderr)
sys.exit(1)

xml_file = sys.argv[1]
output = extract_recent_url_blocks(xml_file)

# 输出到标准输出(可重定向到文件)
print(output.strip())

安装python依赖

pip install python-dateutil

程序运行命令

python3 filter_sitemap.py sitemap.xml
python3 filter_sitemap.py sitemap.xml > recent_sitemap.xml

获取的最终内容,达到我的需求了。

# more recent_sitemap.xml 
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.baidu.com/schemas
/sitemap-mobile/1/"><url>
<loc>https://www.liuguohua.com</loc>
<lastmod>2025-06-22T05:35:08+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3899.html</loc>
<lastmod>2025-06-18T17:32:27+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3895.html</loc>
<lastmod>2025-06-18T15:36:33+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3892.html</loc>
<lastmod>2025-06-18T10:33:23+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3887.html</loc>
<lastmod>2025-06-17T20:28:36+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3883.html</loc>
<lastmod>2025-06-17T16:51:02+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3880.html</loc>
<lastmod>2025-06-14T17:22:26+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3877.html</loc>
<lastmod>2025-06-12T12:08:43+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3872.html</loc>
<lastmod>2025-06-12T11:14:35+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3869.html</loc>
<lastmod>2025-06-12T08:43:38+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3858.html</loc>
<lastmod>2025-06-11T19:05:55+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3850.html</loc>
<lastmod>2025-06-10T20:08:03+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3846.html</loc>
<lastmod>2025-06-10T10:50:25+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3842.html</loc>
<lastmod>2025-06-06T15:52:05+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3836.html</loc>
<lastmod>2025-06-06T15:22:04+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3833.html</loc>
<lastmod>2025-06-06T15:18:07+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3830.html</loc>
<lastmod>2025-06-06T15:08:33+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3827.html</loc>
<lastmod>2025-06-06T15:03:35+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3822.html</loc>
<lastmod>2025-06-05T12:58:09+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3818.html</loc>
<lastmod>2025-06-04T17:15:58+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3808.html</loc>
<lastmod>2025-06-04T16:44:59+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3805.html</loc>
<lastmod>2025-06-04T16:40:53+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3802.html</loc>
<lastmod>2025-06-04T16:36:27+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3798.html</loc>
<lastmod>2025-06-04T16:29:49+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3794.html</loc>
<lastmod>2025-06-04T16:13:56+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3758.html</loc>
<lastmod>2025-06-02T16:11:56+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3763.html</loc>
<lastmod>2025-06-02T16:21:17+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3769.html</loc>
<lastmod>2025-06-02T16:25:25+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3772.html</loc>
<lastmod>2025-06-02T16:28:12+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3775.html</loc>
<lastmod>2025-06-02T16:29:35+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3778.html</loc>
<lastmod>2025-06-02T16:32:59+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3781.html</loc>
<lastmod>2025-06-02T16:35:23+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3785.html</loc>
<lastmod>2025-06-02T16:37:01+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3788.html</loc>
<lastmod>2025-06-02T16:38:11+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3751.html</loc>
<lastmod>2025-05-29T11:30:08+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>https://www.liuguohua.com/3748.html</loc>
<lastmod>2025-05-28T22:21:18+08:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>
</urlset>

 

声明:欢迎大家光临本站,学习IT运维技术,转载本站内容,请注明内容出处”来源刘国华教育“。如若本站内容侵犯了原著者的合法权益,请联系我们进行处理。