发现我的wordpress程序,并没有生产wp-sitemap.xml文件,网上也在说wp默认生成的wp-sitemap.xml文件,百度并不识别,需要程序再处理一下。于是我依网上教程,写了一个PHP程序,放到linux的crontab计划里,每天执行一次,程序脚本如下
<?php require('./wp-blog-header.php'); header("Content-type: text/xml"); header('HTTP/1.1 200 OK'); $posts_to_show = 9999; echo '<?xml version="1.0" encoding="UTF-8"?>'; echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.baidu.com/schemas/sitemap-mobile/1/">' ?> <url> <loc><?php echo get_home_url(); ?></loc> <lastmod><?php $ltime = get_lastpostmodified('GMT');$ltime = gmdate('Y-m-d\TH:i:s+00:00', strtotime($ltime)); echo $ltime; ?></lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> <?php /* 输出普通文章 POST */ $myposts = get_posts("numberposts=" . $posts_to_show ); foreach( $myposts as $post ) { ?> <url> <loc><?php the_permalink(); ?></loc> <lastmod><?php the_time('c') ?></lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <?php } /* 普通文章循环结束 */ ?> <?php /* 输出页面 */ $mypages = get_pages(); if(count($mypages) > 0) { foreach($mypages as $page) { ?> <url> <loc><?php echo get_page_link($page->ID); ?></loc> <lastmod><?php echo str_replace(" ","T",get_page($page->ID)->post_modified); ?>+00:00</lastmod> <changefreq>weekly</changefreq> <priority>0.6</priority> </url> <?php }} /* 页面循环结束 */ ?> <?php /* 输出普通文章分类 */ $terms = get_terms('category', 'orderby=name&hide_empty=0' ); $count = count($terms); if($count > 0){ foreach ($terms as $term) { ?> <url> <loc><?php echo get_term_link($term, $term->slug); ?></loc> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> <?php }} /* 普通文章分类循环结束 */?> <?php /* 输出普通文章标签(可选) */ $tags = get_terms("post_tag"); foreach ( $tags as $key => $tag ) { $link = get_term_link( intval($tag->term_id), "post_tag" ); if ( is_wp_error( $link ) ) return false; $tags[ $key ]->link = $link; ?> <url> <loc><?php echo $link ?></loc> <changefreq>monthly</changefreq> <priority>0.4</priority> </url> <?php } /* 普通文章标签循环结束 */ ?> </urlset>
把程序放到计划任务里
0 1 * * * wget -O /data/wwwroot/lgh/sitemap.xml --no-check-certificate https://www.liuguohua.com/lgh-sitemap.php && ln -s /data/wwwroot/lgh/sitemap.xml /data/wwwroot/lgh/wp-sitemap.xml >/dev/null 2>&1
监控几天后,确认程序可以正常跑,似乎百度也会过来抓取wp-sitemap.xml文件
但如果后期文件内容一多,每次都产生全量的信息,肯定不好,于是对生产的wp-sitemap.xml文件,再加工一下,只需要最近30天的数据。用python3写了一个程序,内容如下:
import sys from datetime import datetime, timedelta from dateutil import parser def extract_recent_url_blocks(xml_file, days=30): # 计算截止日期(当前时间 - 30 天) cutoff = datetime.now() - timedelta(days=days) with open(xml_file, 'r', encoding='utf-8') as f: content = f.readlines() # 提取前6行(通常是XML声明和<urlset>) header = ''.join(content[:6]) # 合并剩余内容并按<url>分割 body = ''.join(content[6:]) url_blocks = body.split('<url>')[1:] # 忽略第一个空块 recent_blocks = [] for block in url_blocks: if '</url>' not in block: continue # 提取<lastmod>时间 lastmod_start = block.find('<lastmod>') + len('<lastmod>') lastmod_end = block.find('</lastmod>') lastmod_str = block[lastmod_start:lastmod_end].strip() try: lastmod = parser.parse(lastmod_str) if lastmod.replace(tzinfo=None) >= cutoff.replace(tzinfo=None): recent_blocks.append(f"<url>{block}") except Exception as e: print(f"解析时间失败: {lastmod_str},错误: {e}", file=sys.stderr) # 拼接结果:前7行 + 筛选的<url> + </urlset> result = header + ''.join(recent_blocks) + "</urlset>" return result if __name__ == "__main__": if len(sys.argv) != 2: print("用法: python3 filter_sitemap.py sitemap.xml", file=sys.stderr) sys.exit(1) xml_file = sys.argv[1] output = extract_recent_url_blocks(xml_file) # 输出到标准输出(可重定向到文件) print(output.strip())
安装python依赖
pip install python-dateutil
程序运行命令
python3 filter_sitemap.py sitemap.xml python3 filter_sitemap.py sitemap.xml > recent_sitemap.xml
获取的最终内容,达到我的需求了。
# more recent_sitemap.xml <?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.baidu.com/schemas /sitemap-mobile/1/"><url> <loc>https://www.liuguohua.com</loc> <lastmod>2025-06-22T05:35:08+00:00</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> <url> <loc>https://www.liuguohua.com/3899.html</loc> <lastmod>2025-06-18T17:32:27+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3895.html</loc> <lastmod>2025-06-18T15:36:33+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3892.html</loc> <lastmod>2025-06-18T10:33:23+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3887.html</loc> <lastmod>2025-06-17T20:28:36+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3883.html</loc> <lastmod>2025-06-17T16:51:02+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3880.html</loc> <lastmod>2025-06-14T17:22:26+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3877.html</loc> <lastmod>2025-06-12T12:08:43+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3872.html</loc> <lastmod>2025-06-12T11:14:35+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3869.html</loc> <lastmod>2025-06-12T08:43:38+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3858.html</loc> <lastmod>2025-06-11T19:05:55+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3850.html</loc> <lastmod>2025-06-10T20:08:03+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3846.html</loc> <lastmod>2025-06-10T10:50:25+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3842.html</loc> <lastmod>2025-06-06T15:52:05+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3836.html</loc> <lastmod>2025-06-06T15:22:04+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3833.html</loc> <lastmod>2025-06-06T15:18:07+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3830.html</loc> <lastmod>2025-06-06T15:08:33+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3827.html</loc> <lastmod>2025-06-06T15:03:35+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3822.html</loc> <lastmod>2025-06-05T12:58:09+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3818.html</loc> <lastmod>2025-06-04T17:15:58+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3808.html</loc> <lastmod>2025-06-04T16:44:59+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3805.html</loc> <lastmod>2025-06-04T16:40:53+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3802.html</loc> <lastmod>2025-06-04T16:36:27+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3798.html</loc> <lastmod>2025-06-04T16:29:49+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3794.html</loc> <lastmod>2025-06-04T16:13:56+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3758.html</loc> <lastmod>2025-06-02T16:11:56+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3763.html</loc> <lastmod>2025-06-02T16:21:17+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3769.html</loc> <lastmod>2025-06-02T16:25:25+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3772.html</loc> <lastmod>2025-06-02T16:28:12+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3775.html</loc> <lastmod>2025-06-02T16:29:35+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3778.html</loc> <lastmod>2025-06-02T16:32:59+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3781.html</loc> <lastmod>2025-06-02T16:35:23+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3785.html</loc> <lastmod>2025-06-02T16:37:01+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3788.html</loc> <lastmod>2025-06-02T16:38:11+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3751.html</loc> <lastmod>2025-05-29T11:30:08+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.liuguohua.com/3748.html</loc> <lastmod>2025-05-28T22:21:18+08:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> </urlset>
声明:欢迎大家光临本站,学习IT运维技术,转载本站内容,请注明内容出处”来源刘国华教育“。如若本站内容侵犯了原著者的合法权益,请联系我们进行处理。