-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpublicly_available.yaml
40 lines (39 loc) · 2.38 KB
/
publicly_available.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# helix test -f publicly_available.yaml --rsync ./pdfs2
apiVersion: app.aispec.org/v1alpha1
kind: AIApp
metadata:
name: Failure tests for Helix
spec:
assistants:
- name: Default Assistant
model: microsoft/phi-4
type: text
system_prompt: You are a financial expert who only replies with the information
from the provided context. If not relevant information was provided in
the context, indicate that.
knowledge:
- name: publicly_available
source:
filestore:
path: publicly_available
tests:
- name: Test 1
steps:
- prompt: What preprocessing technique helps standardize word forms to their base or dictionary form?
expected_output: A correct answer should identify lemmatization as the technique that reduces word inflection forms to their common base form.
- name: Test 2
steps:
- prompt: What preprocessing technique mentioned in the methodology helps standardize word forms to their base or dictionary form?
expected_output: A correct answer should identify lemmatization as the technique that reduces word inflection forms to their common base form.
- name: Test 3
steps:
- prompt: How does the performance of the strategy differ between small-cap and large-cap companies, and what explanation could there be for this difference?
expected_output: A correct answer should state that performance is stronger in the small-cap space compared to the large-cap segment, and explain that this is because large-cap companies are more widely followed by analysts and receive more news coverage, so information is more quickly incorporated into their stock prices.
- name: Test 4
steps:
- prompt: What specific preprocessing steps can we employ before calculating textual similarity between documents?
expected_output: A correct answer must list all four preprocessing steps - 1) tag removal, 2) numerical value removal, 3) punctuation removal, and 4) lemmatization. Missing any of these steps or adding incorrect steps would make the answer incomplete or incorrect.
- name: Test 5
steps:
- prompt: How can we measure changes in risk disclosures?
expected_output: A correct answer should identify - Minimum Edit Distance, Jaccard Similarity, and Cosine Similarity.