【lucene】FastVectorHighlighter案例

发布于:2025-08-04 ⋅ 阅读:(18) ⋅ 点赞:(0)

下面给出一套可直接拷贝运行的 Lucene 8.5.0 + FastVectorHighlighter 完整示例(JDK 8+),演示从建索引、查询到高亮的全过程。  

 

> 关键点:字段必须  

1. 存储原始内容(`setStored(true)`)  

2. 开启 TermVector(`setStoreTermVectors(true)` + `setStoreTermVectorPositions(true)` + `setStoreTermVectorOffsets(true)`)

 

---

 

1. Maven 依赖(Lucene 8.5.0)

 

```xml

<dependencies>

    <dependency>

        <groupId>org.apache.lucene</groupId>

        <artifactId>lucene-core</artifactId>

        <version>8.5.0</version>

    </dependency>

    <dependency>

        <groupId>org.apache.lucene</groupId>

        <artifactId>lucene-analyzers-common</artifactId>

        <version>8.5.0</version>

    </dependency>

    <dependency>

        <groupId>org.apache.lucene</groupId>

        <artifactId>lucene-highlighter</artifactId>

        <version>8.5.0</version>

    </dependency>

</dependencies>

```

 

---

 

2. Java 示例代码

 

```java

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.*;

import org.apache.lucene.index.*;

import org.apache.lucene.search.*;

import org.apache.lucene.store.ByteBuffersDirectory;

import org.apache.lucene.store.Directory;

import org.apache.lucene.search.highlight.*;

import org.apache.lucene.search.vectorhighlight.*;

 

public class FastVectorHighlighterDemo {

    public static void main(String[] args) throws Exception {

        Directory dir = new ByteBuffersDirectory();

        IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());

        IndexWriter writer = new IndexWriter(dir, cfg);

 

        // 1. 定义字段类型:存储 + 分词 + TermVector

        FieldType fieldType = new FieldType();

        fieldType.setStored(true); // 存储原文

        fieldType.setTokenized(true); // 分词

        fieldType.setStoreTermVectors(true); // 必须

        fieldType.setStoreTermVectorPositions(true); // 必须

        fieldType.setStoreTermVectorOffsets(true); // 必须

        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

        fieldType.freeze();

 

        // 2. 添加文档

        Document doc = new Document();

        doc.add(new Field("title", "Lucene 8.5.0 FastVectorHighlighter示例", fieldType));

        doc.add(new Field("body",

                "Lucene是一个高效的全文检索库。FastVectorHighlighter利用TermVector实现高速高亮。", fieldType));

        writer.addDocument(doc);

        writer.commit();

        writer.close();

 

        // 3. 查询 & 高亮

        IndexReader reader = DirectoryReader.open(dir);

        IndexSearcher searcher = new IndexSearcher(reader);

 

        Query query = new BooleanQuery.Builder()

                .add(new TermQuery(new Term("body", "全文检索")), BooleanClause.Occur.SHOULD)

                .add(new TermQuery(new Term("body", "高亮")), BooleanClause.Occur.SHOULD)

                .build();

 

        TopDocs topDocs = searcher.search(query, 10);

        int docId = topDocs.scoreDocs[0].doc;

 

        // 4. 使用 FastVectorHighlighter

        FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true,

                new SimpleFragListBuilder(5),

                new ScoreOrderFragmentsBuilder(

                        BaseFragmentsBuilder.COLORED_PRE_TAGS,

                        BaseFragmentsBuilder.COLORED_POST_TAGS));

 

        FieldQuery fieldQuery = highlighter.getFieldQuery(query);

        String[] frags = highlighter.getBestFragments(fieldQuery, reader, docId,

                "body", 100, 3);

 

        // 5. 输出结果

        System.out.println("Title: " + reader.document(docId).get("title"));

        for (String f : frags) {

            System.out.println("Fragment: " + f);

        }

 

        reader.close();

    }

}

```

 

---

 

3. 运行结果(示例)

 

```

Title: Lucene 8.5.0 FastVectorHighlighter示例

Fragment: Lucene是一个高效的<b style="background:yellow">全文检索</b>库。FastVectorHighlighter利用TermVector实现高速<b style="background:lawngreen">高亮</b>。

```

 

---

 

4. 常见坑提醒

 

问题 原因 

高亮返回 `null` 字段没开启 TermVector,或没 `setStored(true)` 

MultiPhraseQuery / SpanQuery 无法高亮 FastVectorHighlighter 不支持,需换 UnifiedHighlighter 的 re-analysis 模式 

 

---

 

直接复制到 IDE 即可运行,祝编码愉快!


网站公告

今日签到

点亮在社区的每一天
去签到