A review of prompt engineering methods in large language models

Document Type : Original Article

Authors

1 Department of Information Science and Epistemology, Faculty of Management and Economics, Tarbiat Modares University, Tehran, Iran

2 Assistant Prof., Department of Computer Engineering, National University of Skills (NUS), Tehran, Iran.

Abstract

Prompt engineering is the process of structuring input text for large language models. This process is considered very necessary to optimize the performance of large language models; but at the same time, it is a challenging process. The purpose of this research is to investigate the basics of prompt engineering and the basics and advanced methods of writing prompts, and it also investigates the evaluation methods and applications of prompt engineering in processing natural losses. This research shows how understanding prompt engineering can be effective in improving the results of intelligent content creation tools and reducing the machine Hallucination phenomenon, and also provides valuable insights for researchers to explore in this field. Research in prompt engineering, investigating the basics and techniques and understanding its applications can help manage information and knowledge in terms of improving the accuracy and quality of information, reducing errors caused by machine Hallucination, data analysis, increasing the efficiency of content production, facilitating access. It helps data analysis, and generally improves the quality of information and, as a result, facilitates decision-making processes.

Keywords

Main Subjects


[1] DK. Dwivedi, et al., (2023). So what if ChatGPT wrote it? Multidisciplinarand perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policand, Int. J. Inf. Manag., 71, https://doi.org/10.1016/j.ijinfomgt.2023.102642. 
[2] Harrer, S., (2023). Attention is not all andou need: the complicated case of ethicalland using large language models in healthcare and medicine, eBioMedicine, 90, https://doi.org/10.1016/j.ebiom.2023.104512.
[3] Andrew, G., (2023). Implications of ChatGPT and Large Language Models for Environmental Policymaking. Social Science Research Network, https://doi.org/10.2139/ssrn.4499643.
[4] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez AN., et al., (2017). Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17; p. 6000–6010. https://doi.org/10.48550/arXiv.1706.03762
[5] Bender, EM., Gebru, T., McMillan-Major, A., Shmitchell, S., (2021). On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency; p. 610–623. https://doi.org/10.1145/3442188.3445922
[6] Giray, L., (2023). Prompt engineering with ChatGPT: a guide for academic writers, Ann. Biomed. Eng., DOI:10.1007/s10439-023-03272-4
[7] White, J. et al., (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt, ArXiv Prepr. ArXiv230211382, https://doi.org/10.48550/arXiv.2302.11382
[8] Eager, B., and Brunton, R., (2023). Prompting higher education towards AI-Augmented teaching and learning practice, J. Univ. Teach. Learn. Pract., 20(5), https://doi.org/ 10.53761/ 1.20.5.02.
[9] Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., McHardy, R. (2023). Challenges and applications of large language models; ArXiv:2307.10169. https://doi.org/ 10.48550/arXiv.2307.10169
[10] Lu Y, Bartolo M, Moore A, Riedel S, (2022). Stenetorp P. Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics; p. 8086–8098. https://doi.org/10.48550/arXiv.2104.08786
[11] Webson. A, Pavlick E., (2022). Do prompt-based models really understand the meaning of their prompts? In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. p. 2300–2344. https://doi.org/10.48550/arXiv.2109.01247
[12] Maynez. J, Narayan. S, Bohnet. B, McDonald. R., (2020). On faithfulness and factuality in abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. p. 1906–1919. https://doi.org/10.48550/arXiv.2005.00661
[13] Bubeck. S, Chandrasekaran. V, Eldan. R, Gehrke. J, Horvitz. E, Kamar E, et al., (2023). Sparks of artificial general intelligence: early experiments with GPT-4. ArXiv:2303.12712. https://doi.org/10.48550/arXiv.2303.12712
[14] Lo, L.S., (2023). The art and science of prompt engineering: a new literacy in the information age, Internet Ref. Serv. Q., https://doi.org/10.1080/10875301.2023.2227621.
[15] Jha, S., Jha, S.K., Lincoln, P., Bastian, N.D., Velasquez, A., and Neema, S., (2023). Dehallucinating large language models using formal methods guided iterative prompting, in: 2023 IEEE International Conference on Assured Autonomy (ICAA), IEEE. pp. 149-152. DOI:10.1109/ICAA58325.2023.00029
[18] Shanahan. M, McDonell K, Reynolds. L. (2023). Role-play with large language models; ArXiv:2305.16367. https://doi.org/10.48550/arXiv.2305.16367
[19] Wei. J, Wang X, Schuurmans. D, Bosma. M, Ichter. B, Xia F, et al., (2022). Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems. vol. 35; p. 24824–24837. https://doi.org/10.48550/arXiv.2201.11903
[20] Lecler. A, Duron, L., and Soander, P., (2023). Revolutionizing radiologand with GPT-based models: current applications, future possibilities and limitations of ChatGPT, Diagn. Interv. Imaging, 104(6), pp. 269-274, https://doi.org/10.1016/j.diii.2023.02.003.
[21] Epstein, R.H. and Dexter, F., (2023). Variabilitand in large language Models' responses to medical licensing and certification examinations. comment on How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment, JMIR Med. Educ., 9, https://doi.org/10.2196/48305.
[22] Cooper, G., (2023). Examining science education in ChatGPT: An exploratorand studand of generative artificial intelligence, J. Sci. Educ. Technol., 32(3), pp. 444-452, https://doi.org/ 10.1007/s10956-023-10039-and.
[23] Chang, E.Y., (2023). Prompting large language models with the socratic method, in: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference, CCWC, pp. 351-360. https://doi.org/10.1109/CCWC57344.2023.10099179.
[24] White, J. et al., (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt, ArXiv Prepr. ArXiv230211382, https://doi.org/10.48550/arXiv.2302.11382
[25] Shieh, J., (2023). Best practices for prompt engineering with OpenAI API, OpenIA, [online]. Accessed: October 3rd of 2023. https://doi.org/10.15446/dyna.v90n230.111700
[26] Yao, S. et al., (2023). Tree of thoughts: deliberate problem solving with large language models, ArXiv Prepr. ArXiv230510601, https://doi.org/10.48550/arXiv.2305.10601
[27] Spasic, A.J., and Jankovic, D.S., (2023). Using ChatGPT standard prompt engineering techniques in lesson preparation: role, instructions and seed-word prompts, in: 2023 58th International Scientific Conference on Information, Communication and Energand Sandstems and Technologies, ICEST 2023 - Proceedings, pp. 47-50. https://doi.org/10.1109/ICEST58410.2023.10187269.
[28] Lo, L.S., (2023). The CLEAR path: a framework for enhancing information literacand through prompt engineering, J. Acad. Librariansh., 49(4), https://doi.org/ 10.1016/ j.acalib. 2023.102720.
[29] Zhang Z, Gao J, Dhaliwal RS, Jia-Jun. Li T. (2023). VISAR: a human-AI argumentative writing assistant with visual programming and rapid draft prototyping; ArXiv:2304.07810. https://doi.org/10.48550/arXiv.2304.07810
[30] Buren, DV., (2023). Guided scenarios with simulated expert personae: a remarkable strategy to perform cognitive work; ArXiv:2306.03104. https://doi.org/10.48550/arXiv.2306.03104
[31] Learn, Prompting., (2023). Learn Prompting: Your Guide to Communicating with AI. learnprompting.org/docs/basics/roles. https://doi.org/10.3390/ime2030019
[32] Open, AI., (2023). Tactic: use delimiters to clearly indicate dis figshare tinct parts of the input. Accessed: 2023-09-01. https://platform.openai.com/docs/guides/gpt-best-practices/tactic-use-delimiters-to-clearly-indicate-distinct-parts-of-the-input.
[33] Chen, B. Zhaofeng, Z. Langrené, N, Zhu S., (2024) Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. (Submitted on 23 Oct 2023 (v1), last revised 18 (this version, v4)). arXiv:2310.14735 [cs.CL]. https://doi.org/ 10.48550/ arXiv.2310.14735
[34] Wu S, Shen EM, Badrinath C, Ma J, Lakkaraju H., (2023). Analyzing chain-of-thought prompting in large language models via gradient-based feature Attributions; ArXiv:2307.13339. https://doi.org/10.48550/arXiv.2307.13339
[35] Lewkowycz. A, Andreassen. A, Dohan D, Dyer E, Michalewski H, Ramasesh V, et al., (2022). Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems. 35:3843–3857. https://doi.org/10.48550/arXiv.2206.14858
[36] Zhou H, Nova A, Larochelle H, Courville A, Neyshabur B, Sedghi H. (2022). Teaching Algorithmic Reasoning via In-context Learning; ArXiv:2211.09066. https://doi.org/ 10.48550/ arXiv.2211.09066
[37] Lee N, Sreenivasan K, Lee JD, Lee K, Papailiopoulos D. (2023). Teaching arithmetic to small transformers; ArXiv:2307.03381. https://doi.org/10.48550/arXiv.2307.03381
[38] Wang, Boshi, et al., (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. ArXiv:2212.10001 [Cs], Dec. arxiv.org/abs/2212.10001. https://doi.org/10.48550/arXiv.2212.10001
[39] Gao, Andrew., (2023). Prompt Engineering for Large Language Models: A brief guide with examples for non-technical readers. Available at SSRN: https://ssrn.com/abstract=4504303 or http://dx.doi.org/10.2139/ssrn.4504303
[40] Kojima. Takeshi, et al., (2022). Large Language Models Are Zero-Shot Reasoners. ArXiv:2205.11916 [Cs], arxiv.org/abs/2205.11916. https://doi.org/10.48550/arXiv.2205.11916
[41] Prystawski. Ben, et al., (2023). Why Think Step by Step? Reasoning Emerges from the Locality of Experience. ArXiv.org, https://doi.org/10.48550/ arXiv.2304.03843.
[42] Del M, Fishel M., (2023). True detective: a deep abductive reasoning benchmark undoable for GPT-3 and challenging for GPT-4. In: Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023); https://doi.org/10.48550/arXiv.2212.10114
[43] Yao S, Yu D, Zhao J, Shafran I, Griffiths TL, Cao Y, et al., (2023). Tree of thoughts: deliberate problem solving with large language models; ArXiv:2305.10601. https://doi.org/ 10.48550/ arXiv.2305.10601
[44] Long. J., (2023). Large language model guided tree-of-thought; ArXiv:2305.08291. https://doi.org/10.48550/arXiv.2305.08291
[45] Hulbert. D., (2023). Tree of knowledge: ToK aka Tree of Knowledge dataset for Large Language Models LLM. Accessed: 2023-8-15. figshare https://github.com/dave1010/tree-of-thought-prompting.
[46] Besta. M, Blach N, Kubicek A, Gerstenberger R, Gianinazzi L, Gajda J, et al., (2023). Graph of thoughts: solving elaborate problems with large language models; ArXiv:2308.09687. https://doi.org/10.1609/aaai.v38i16.29720
[47] Wang. L, Ma C, Feng X, Zhang Z, Yang H, Zhang J, et al., (2023). A survey on large language model based autonomous agents; ArXiv:2308.11432. https://doi.org/10.1007/s11704-024-40231-1
[48] Besta1. M, Blach1 N, Kubicek A, Gerstenberger R, Podstawski M, Gianinazzi L, & et.al., (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv:2308.09687v4 [cs.CL]. https://github.com/spcl/graph-of-thoughts. https://doi.org/ 10.1609/ aaai.v38i16.29720
[49] Logan. IV R, Balaˇzevi´c I, Wallace E, Petroni F, Singh S, Riedel S., (2022). Cutting down on prompts and parameters: simple few-shot learning with language models. In: Findings of the Association for Computational Linguistics: ACL 2022; p. 2824–2835. https://doi.org/10.48550/arXiv.2106.13353
[50] Shyr. C, Hu Y, Harris PA, Xu H., (2023). Identifying and extracting rare disease phenotypes with large language models; ArXiv:2306.12656. https://doi.org/10.1007/s41666-023-00155-0
[51] Brown, Tom, et al., (2020). Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems, vol. 33, pp. 1877–901, proceedings.neurips.cc/paper_files/ paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html?utm_medium= email&utm_source=transaction. https://doi.org/10.48550/arXiv.2005.14165
[52] Reynolds. L, McDonell K., (2021). Prompt programming for large language mod-els: beyond the few-shot paradigm. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems; p. 1–7. https://doi.org/10.48550/arXiv.2102.07350
[53]       Brown. TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al., (2020). language models Are Few-Shot Learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20; https://doi.org/10.48550/arXiv.2005.14165
[54]       Liu. J, Gardner M, Cohen SB, Lapata M., (2020). Multi-step inference for reasoning over paragraphs; ArXiv:2004.02995. https://doi.org/10.18653/v1/2020.emnlp-main.245
[55] Wang. Boshi, et al., (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. ArXiv:2212.10001 [Cs], Dec. arxiv.org/abs/2212.10001. https://doi.org/10.48550/arXiv.2212.10001
[56]       Wei. J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al., (2022). Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems. vol. 35; p. 24824–24837. https://doi.org/10.48550/arXiv.2201.11903
[57]       Wang X, Wei J, Schuurmans D, Le QV, Chi EH, Narang S, et al. (2023). Self-consistency improves chain of thought reasoning in language models. In: Eleventh International Conference on Learning Representations; https://doi.org/10.48550/arXiv.2203.11171
[58] Bender EM, Gebru T, McMillan-Major A, Shmitchell S., (2021). On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency; p. 610–623. https://doi.org/10.1145/3442188.3445922
[59] Shum. K, Diao S, Zhang. T., (2023). Automatic prompt augmentation and selection with chain-of-thought from labeled data; ArXiv:2302.12822. https://doi.org/10.48550/ arXiv.2302.12822
[60] Khalifa. M, Logeswaran L, Lee M, Lee H, Wang. L., (2023). Discriminator-guided multi-step reasoning with language models; ArXiv:2305.14934. https://doi.org/10.48550/arXiv.2305.14934
[61] Zhou D, Sch¨arli N, Hou L, Wei J, Scales N, Wang X, et al.,. (2023). Least-to-most prompting enables complex reasoning in large language models. In: Eleventh International Conference on Learning Representations; https://doi.org/10.48550/arXiv.2205.10625
[62] Liu. J, Liu A, Lu X, Welleck S, West P, Le Bras R, & et al., (2022). Generated knowledge prompting for commonsense reasoning. In: Proceedings of the 60th Annual Meet-ing of the Association for Computational Linguistics (Volume 1: Long Papers); p. 3154–3169. https://doi.org/10.48550/arXiv.2110.08387
[63] Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The Prompt Report: A Systematic Survey of Prompting Techniques. arXiv preprint arXiv:2406.06608. https://arxiv.org/abs/2406.06608
[64] Deng M, Wang J, Hsieh C-P, Wang Y, Guo H, Shu T, & et.al., (2022). RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empiri-cal Methods in Natural Language Processing, pages 3369–3391, Abu Dhabi, United Arab Emirates. As-sociation for Computational Linguistics. https://doi.org/ 10.48550/ arXiv. 2205.12548
[65] Xu H, Chen Y, Du Y, Shao N, Wang Y, Li H, and Yang Z., (2022). Gps: Genetic prompt search for efficient few-shot learning. 2022. arXiv preprint arXiv:2210.17041. https://doi.org/10.48550/arXiv.2210.17041
[66] Wan. X, Sun R, Nakhost. H, Dai H, Eisenschlos. J, Arik. S, and Pfister. T., (2023b). Universal self-adaptive prompting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7437–7462, Singapore. Association for Computational Linguistics. https://doi.org/10.48550/arXiv.2305.14926
[67] Ye. Q, Axmed. M, Pryzant. R, Khani. F., (2024). Prompt Engineering a Prompt Engineer, Submitted on 9 Nov 2023 (v1), last revised 3 Jul 2024 (this version, v3)
arXiv:2311.05661
 [cs.CL]. https://doi.org/10.48550/arXiv.2311.05661
[68] Holtzman. A, Buys J, Du L, Forbes. M, Choi Y., (2020). The curious case of neural text degeneration. In: International Conference on Learning Representations; https://doi.org/10.48550/arXiv.1904.09751
[69] Raand P.P., and Majumder, P., (2023). Assessing the Accuracand of responses band the language model ChatGPT to questions regarding bariatric surgerand: a critical appraisal, Obes. Surg., 33(8), pp. 2588-2589, https://doi.org/10.1007/s11695-023-06664-6.
[70] Gupta, R., Herzog, I., Weisberger, J., Chao, J., Chaiandasate, K., and Lee, E.S., (2023). Utilization of ChatGPT for plastic surgerand research: friend or foe, J. Plast. Reconstr. Aesthet. Surg., 80, pp. 145-147, https://doi.org/10.1016/j.bjps.2023.03.004.
[71] Deiana, G., Dettori, M., Arghittu, A., Azara, A., Gabutti, G., and Castiglia, P., (2023). Artificial intelligence and public health: evaluating ChatGPT responses to vaccination mandths and misconceptions, Vaccines, 11(7), art. 11071217, https://doi.org/10.3390/vaccines11071217.
[72] Lo, L.S., (2023). The CLEAR path: a framework for enhancing information literacand through prompt engineering, J. Acad. Librariansh., 49(4), https://doi.org/10.1016/j.acalib.2023.102720.
[73] Sai AB, Mohankumar AK, Khapra MM. (2022). A survey of evaluation metrics used for NLG systems. ACM Computing Surveys (CSUR). 55(2):1–39. https://doi.org/10.48550/arXiv.2008.12009
[74] Papineni K, Roukos S, Ward T, Zhu WJ. BLEU:, (2002). a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics; p. 311–318. https://doi.org/10.3115/1073083.10731
[75] Lin, C.Y., (2004) Rouge: A Package for Automatic Evaluation of Summaries. Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL, Barcelona, 25 July 2004.; p. 74–8.
[76] Banerjee. S, Lavie. A., (2005). METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Work-shop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization; p. 65–72.
[77] Zhang. T, Kishore V, Wu F, Weinberger. KQ, Artzi Y., (2020). BERTScore: evalu-ating text generation with BERT. In: International Conference on Learning Representations; https://doi.org/10.48550/arXiv.1904.09675
[78] Stent. A, Marge. M, Singhai. M., (2005). Evaluating evaluation methods for generation in the presence of variation. In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer; p. 341–351. DOI:10.1007/978-3-540-30586-6_38
[79] Deng M, Wang. J, Hsieh CP, Wang Y, Guo. H, Shu. T, et al., (2022). RLPrompt: optimiz-ing discrete text prompts with reinforcement learning. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; p. 3369–3391. https://doi.org/ 10.48550/arXiv.2205.12548
[80] Zhou. Y, Muresanu AI, Han Z, Paster. K, Pitis S, Chan H, et al. (2022). Large language models are human-level prompt engineers. In: Eleventh International Conference on Learning Representations; https://doi.org/10.48550/arXiv.2211.01910
[81] Ajith. A, Pan C, Xia. M, Deshpande A, Narasimhan. K., (2023). InstructEval: systematic evaluation of instruction selection methods; ArXiv:2307.00259. https://doi.org/10.48550/arXiv.2307.00259
[82] Maynez J, Narayan. S, Bohnet B, McDonald. R., (2020). On faithfulness and factuality in abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; p. 1. https://doi.org/10.48550/arXiv.2005.00661
[83] Lee. K, Firat. O, Agarwal A, Fannjiang. C, Sussillo. D., (2018). Hallucinations in neural machine translation; https://doi.org/10.48550/arXiv.2301.07779
[84] Ji. Z, Lee. N, Frieske. R, Yu T, Su D, Xu Y, et al., (2023). Survey of hallucination in natural language generation. ACM Computing Surveys. 55(12):1–38. https://doi.org/10.1145/3571730
[85] Ram. O, Levine Y, Dalmedigos. I, Muhlgay. D, Shashua. A, Leyton-Brown K, et al., (2023). In-context retrieval-augmented language models; https://doi.org/10.48550/arXiv.2302.00083
[86] Lazaridou. A, Gribovskaya. E, Stokowiec. W, Grigorev N., (2022). Internet-augmented lan-guage models through few-shot prompting for open-domain question answering; ArXiv:2203.05115. https://doi.org/10.48550/arXiv.2203.05115
[87] Jiang. Z, Xu FF, Gao. L, Sun Z, Liu Q, Dwivedi-Yu J, et al., (2023). Active retrieval augmented generation; ArXiv:2305.06983. https://doi.org/10.48550/arXiv.2305.06983
[88] Shehzaad. Dhuliawala, Mojtaba. Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason Weston., (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv preprint arXiv:2309.11495 https://doi.org/ 10.48550/ arXiv.2309.11495
[89] Ziwei Ji, Tiezheng Yu, Yan Xu., (2023). Nayeon Lee, Etsuko Ishii, and Pascale Fung. 2023. Towards Mitigating Hallucination in Large Language Models via Self-Reflection. EMNLP Findings https://doi.org/10.48550/arXiv.2310.06271
[90] Takeshi. Kojima, Shixiang Shane. Gu., (2023). Machel Reid Google Research, Yutaka Matsuo, and Yusuke Iwasawa. 2023. Large Language Models are Zero-Shot Reasoners. URL https://arxiv. org/abs/2205.11916  https://doi.org/10.48550/arXiv.2205.11916
[91] Deren. Lei, Yaxi. Li, Mingyu Wang, Vincent Yun, Emily Ching, Eslam Kamal, et al., (2023). Chain of natural language inference for reducing large language model ungrounded hallucinations. arXiv preprint arXiv:2310.03951 https://doi.org/10.48550/arXiv.2310.03951
[92] Noah. Shinn, Beck Labash, and Ashwin. Gopinath., (2023). Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366 https:// doi.org/10.48550/arXiv.2303.11366
[93] Zhenhailong. Wang, Shaoguang. Mao, Wenshan. Wu, Tao Ge, Furu. Wei, and Heng. Ji., (2023). Unleashing cognitive synergy in large language models: A task-solving agent through multi-persona selfcollaboration. arXiv preprint arXiv:2307.05300 1, 2. 3. https://doi.org/10.48550/arXiv.2307.05300
[94] Shuster. K, Poff S, Chen M, Kiela. D, Weston. J., (2021). Retrieval augmentation reduces hallucination in conversation; ArXiv:2104.07567. https://doi.org/10.48550/arXiv.2104.07567
[95] Lewis. P, Perez. E, Piktus. A, Petroni F, Karpukhin. V, Goyal. N, et al., (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems.33:9459–9474. https://doi.org/10.48550/arXiv.2005.11401
[96] Izacard. G, Grave. E., (2020). Leveraging passage retrieval with generative models for open domain question answering; ArXiv:2007.01282. https://doi.org/10.48550/arXiv.2007.01282
[97] Lewis. M, Liu Y, Goyal. N, Ghazvininejad. M, Mohamed A, Levy O, et al., (2020). BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; p. 7871–7880. https://doi.org/10.48550/arXiv.1910.13461
[98] Roller. S, Dinan. E, Goyal. N, Ju D, Williamson. M, Liu Y, et al., (2021). Recipes for building an open-domain chatbot. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume; p. 300–325. https://doi.org/10.1609/aaai.v38i16.29720
[99] Dhuliawala. S, Komeili M, Xu J, Raileanu. R, Li X, Celikyilmaz. A, et al., (2023). Chain-of-verification reduces hallucination in large language models; ArXiv:2309.11495. https://doi.org/10.48550/arXiv.2309.11495
[100] Sahoo. P, Kumar Singh. A, Saha S, Jain V, Mondal. S, and Chadha. A., (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927, https://doi.org/10.48550/arXiv.2402.07927
[101] Shubham. V, Harsh D., (2024). A Survey of prompt Engineering Methods in Large Language Models for different NLP.  https://doi.org/10.48550/arXiv.2407.12994. arXiv:2407.12994 [cs.CL] [Submitted on 17 Jul 2024 (v1), last revised 24 Jul 2024 (this version, v2)]
[102] Besta. M, Blach. N, Kubicek. A, Gerstenberger. R, Podstawski. M, Gianinazzi. L, Gajda J, Lehmann. T, Niewiadomski. H, Nyczyk. P, Hoefler. T., (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. htor@inf.ethz.ch https://doi.org/10.1609/aaai.v38i16.29720
[103]     Petroni. F, Rocktäschel. T, Riedel, S. Lewis, P, Bakhtin, A, Wu, Yuxiang, & Miller, A., (2019). Language models as knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 2463–2473. https://doi.org/10.18653/v1/D19-1250
[104] Schick. T, & Schütze. H., (2021). It’s not just size that matters: small language models are also few-shot learn-ers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2339–2352. https://doi.org/10.18653/v1/ 2021.naacl-main.185
[105] Gao. T, Fisch. A, and Chen. D., (2021). Making pre-trained language models better few-shot learners. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’21). https://doi.org/10.48550/arXiv.2012.15723
[106] Hambardzumyan. K, Khachatrian. H, and May. J., (2021). WARP: Word-level adversarial reprogramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 4921–4933. https://doi.org/ 10.18653/v1/2021.acl-long.381
[107] Lester. B, Al-Rfou. R, and Constant. N., (2021). The power of scale for parameter-efficient prompt tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 3045–3059. https://doi.org/10.18653/ v1/2021.emnlp-main.243
[108] Schick, T. Schmid, H. & Schütze, H., (2020). Automatically identifying words that can serve as la-bels for few-shot text classification. In Proceedings of the 28th International Conference on Computational Linguistics (COLING’20), Donia Scott, Núria Bel, and Chengqing Zong (Eds.). International Committee on Computational Lin-guistics, 5569–5578. https://doi.org/ 10.18653/ v1/2020.coling-main.488
[109] Schick, T. & Schütze, H. (2021). Exploiting Cloze-questions for few-shot text classification and natural lan-guage inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL’21), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty (Eds.). Association for Compu-tational Linguistics, 255–269. https://doi.org/10.18653/v1/2021.eacl-main.20
[110] Cui. L, Wu. Yu, Liu. J, Sen Yang. and Zhang. Y., (2021). Template-based named entity recognition using BART. arXiv:2106.01760 [cs.CL]. Retrieved from https://doi.org/ 10.48550/arXiv.2106.01760
[112] Shin. R, H. Lin. C, Thomson. S, Chen. C, Roy. S, Antonios. Platanios E, & et.al., (2021). Constrained language models yield few-shot se-mantic parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 7699–7715. https://doi.org/10.18653/v1/2021.emnlp-main.608
[113] Radford. A, Wu J, R. Child, Luan. D, Amodei. D, and Sutskever. I., (2019). Language models are unsu-pervised multitask learners.
[114] Brown. T, Mann. B, Ryder. N, Subbiah. M, D. Kaplan. J, Dhariwal. P, & et.al., (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901. https://doi.org/10.48550/arXiv.2005.14165
[115] Schick. T. and Schütze. H., (2021). Few-shot text generation with natural language instructions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 390–402. https://doi.org/ 10.18653/ v1/2021.emnlp-main.32
[116] Lisa. Li X. and Liang. P., (2021). Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP’21), Volume 1: Long Papers, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
[117] Deepak Ajwani. R, Zhu. Z, J Rose, Rudzicz. F., (2024) Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation. arXiv:2404.05143 [cs.CL] https://doi.org/10.48550/arXiv.2404.05143
[118] Dou. Z-Y, Liu. P, Hayashi. H, Jiang. H, and Neubig. G., (2021). GSum: A general framework for guided neural abstractive summarization. Association for Computational Linguistics, Online, 4830–4842. arXiv:2010.08014 [cs.CL]. https://doi.org/10.48550/arXiv.2010.08014.
[120] Tsimpoukelli. M, Menick. J, Cabi. S, S. M. Ali Eslami, Vinyals. O, and Hill. F., (2021). Multimodal few-shot learning with frozen language models. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems (NeurIPS’21), Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 200–212. https://doi.org/10.48550/arXiv.2106.13884
[121] Alqifari. R., (2019). Question Answering Systems Approaches and Challenges. Conference Paper. DOI:10.26615/issn.2603-2821.2019_011
[122] Rajpurkar. P, Zhang. J, Lopyrev. K, and Liang. P., (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Asso-ciation for Computational Linguistics, 2383–2392. https://doi.org/10.18653/v1/D16-1264
[123] Lai. G, Xie. Q, Liu. H, Yang. Y, and Hovy. E., (2017). RACE: Large-scale reading comprehen-sion dataset from examinations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 785–794. https://doi.org/10.18653/v1/D17-1082
[124] Khashabi. D, Min. S, Khot. T, Sabharwal. A, Tafjord. O, Clark. P, and Hajishirzi. H., (2020). UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computa-tional Linguistics: EMNLP. Association for Computational Linguistics, 1896–1907. https://doi.org/10.18653/v1/ 2020.findings-emnlp.171.
[125] Jiang. Z, Araki. J, Ding. H, and Neubig. G., (2021). How can we know when language models know? On the calibration of language models for question answering. Trans. Assoc. Comput. Ling. 9 (09 2021), 962–977. https://doi.org/10.1162/tacl_a_00407
[126] Schick. T., and Schütze. H., (2021). Generating datasets with pretrained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Ed) DOI: 10.5282/ubm/epub.92195
[127] Çallı. E, Sogancioglu. E, Ginneken. B.v, G. van Leeuwen. K, Murphy. K., (2021). Deep learning for chest X-ray analysis: A survey, Medical Image Analysis. Volume 72, p102-125. ISSN 1361-8415. https://www.sciencedirect.com/science/article/pii/ S1361841521001717) https://doi.org/10.1016/j.media.2021.102125
[128] Ben-David. E, Oved. N, and Reichart. R., (2022). PADA: Example-based prompt learning for on-the-fly adapta-tion to unseen domains. Trans. Assoc. Comput. Linguist. 10 (4 2022), 414–433. https://doi.org/10.1162/tacl_a_00468
CAPTCHA Image