Unlocking AI's Potential: Understanding Inner Voices Through Scoring
Written on
Chapter 1: Introduction to AI Mind Reading
In recent months, my research has focused on large language models, aiming to harness them for evaluating various samples effectively. Initially, I employed a scoring system that ranged from 1 to 5. However, the model consistently returned extreme scores—either the lowest or the highest—lacking subtlety in its assessments.
To enhance this, I transitioned to using descriptive phrases spanning from “very poor” to “excellent.” This adjustment broadened the model's rating range, allowing for more nuanced evaluations. I then experimented with a percentage-based scoring system, from 1 to 100, which further refined the model's capacity to assess sample quality accurately.
Through these trials, I noted that the method of scoring significantly influences the model's effectiveness. I am excited to share a detailed case study that illustrates the differences among various scoring techniques.
The first video, Mind Reading Is Here: AI Can Now Decode the Human Brain's Deepest Secrets!, explores how advances in AI are enabling the interpretation of human thoughts, showcasing the transformative potential of technology in understanding the mind.
Section 1.1: Likert Scale Evaluation
The Likert Scale offers five levels of agreement, from 1 to 5, representing varying attitudes from “strongly disagree” to “strongly agree.” I developed a template of prompt words aligned with this scale.
To assess the quality of a novel excerpt, I ask evaluators to provide a rating based on the following criteria:
- 1 point — Strongly Disagree: Confusing plot, vague characters, uninspiring language.
- 2 points — Disagree: Lacks coherence, shallow character development, poor expression.
- 3 points — Neutral: Acceptable plot, stereotypical characters, mediocre language.
- 4 points — Agree: Well-connected plot, vivid characters, elegant language.
- 5 points — Strongly Agree: Original plot, profound character development, exceptional language.
Please review the following excerpt carefully and share your rating along with a brief comment:
[Insert Novel Excerpt]
Evaluation Criteria:
- Plot: Is it engaging and logical?
- Characters: Are they well-developed and memorable?
- Language: Is it precise and impactful?
- Theme: Does it convey meaningful insights?
If the rating is below 4 points, please rewrite the excerpt to elevate it to at least a 4-point quality, focusing on enhancing the plot, characters, and language.
Subsection 1.1.1: Descriptive Word Assessment
I have also evaluated quality using descriptive terms: “very poor,” “poor,” “average,” “good,” and “excellent.” Here’s a simplified criterion based on these terms:
- Very Poor: Lacks merits, numerous flaws, unacceptable.
- Poor: Flaws outweigh merits, unsatisfactory.
- Average: Acceptable but unremarkable.
- Good: Merits outweigh flaws, good quality.
- Excellent: Outstanding merits, exceptional value.
Please provide your rating of the following novel excerpt based on these standards:
[Insert Novel Excerpt]
Evaluation Focus:
- Storyline: Is it coherent and engaging?
- Character Development: Are the characters well-developed?
- Language Use: Is the language impactful?
- Thematic Depth: Are the themes meaningful?
If your rating is average or below, feel free to revise the excerpt to improve its quality.
Chapter 2: 100-Point Assessment Methodology
In another approach, I assess quality on a scale of 1 to 100.
Please evaluate the overall quality of the novel based on the following standards:
- 1–20 points (Very Poor): Chaotic plot, vague characters, uninspiring language.
- 21–40 points (Poor): Lacks coherence, poor character development.
- 41–60 points (Average): Reasonably smooth but unoriginal.
- 61–80 points (Good): Engaging storyline, well-developed characters.
- 81–100 points (Excellent): Ingeniously crafted with rich themes.
Read the following excerpt and provide your rating and comment:
[Insert Novel Excerpt]
Evaluation Focus:
- Storyline: Is it engaging and logical?
- Character Development: Are characters distinctive?
- Language Use: Is it beautiful and impactful?
- Thematic Depth: Are themes thought-provoking?
If your score is 60 or below, feel free to modify and enhance the excerpt.
The second video, AI is Getting Better At Mind Reading…!, discusses the advancements in AI technologies that allow for deeper understanding and analysis of human emotions, showcasing the evolving landscape of AI.
Conclusion
Through a series of experiments, it has become evident that the scoring design directly influences the output of large language models. From numerical scores to descriptive phrases and percentage systems, each method has brought us closer to realizing the model's potential.
After thorough testing, I advocate for using descriptive words, as this approach captures emotional understanding more accurately and enhances the model's performance. This was also validated in emotion-driven stock trading experiments, where the model provided precise stock recommendations through a combination of fundamental analysis and emotional insights.
Have you encountered similar improvements in performance through changes in incentive mechanisms? I invite your thoughts on this topic.
If you enjoyed this article, consider subscribing to Medium for notifications of my future publications and access to a plethora of stories from other authors.
I am Li Meng, an independent software developer and creator of SolidUI, with a keen interest in new technologies, particularly in the realms of AI and data. If you find my insights valuable, please follow, like, and share. Thank you!