[1] DK. Dwivedi, et al., (2023). So what if ChatGPT wrote it? Multidisciplinarand perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policand, Int. J. Inf. Manag., 71, https://doi.org/10.1016/j.ijinfomgt.2023.102642.
[3] Andrew, G., (2023). Implications of ChatGPT and Large Language Models for Environmental Policymaking. Social Science Research Network, https://doi.org/10.2139/ssrn.4499643.
[4] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez AN., et al., (2017). Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17; p. 6000–6010.
https://doi.org/10.48550/arXiv.1706.03762
[5] Bender, EM., Gebru, T., McMillan-Major, A., Shmitchell, S., (2021). On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency; p. 610–623. https://doi.org/10.1145/3442188.3445922
[6] Giray, L., (2023). Prompt engineering with ChatGPT: a guide for academic writers, Ann. Biomed. Eng., DOI:
10.1007/s10439-023-03272-4
[8] Eager, B., and Brunton, R., (2023). Prompting higher education towards AI-Augmented teaching and learning practice, J. Univ. Teach. Learn. Pract., 20(5),
https://doi.org/ 10.53761/ 1.20.5.02.
[9] Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., McHardy, R. (2023). Challenges and applications of large language models; ArXiv:2307.10169.
https://doi.org/ 10.48550/arXiv.2307.10169
[10] Lu Y, Bartolo M, Moore A, Riedel S, (2022). Stenetorp P. Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics; p. 8086–8098.
https://doi.org/10.48550/arXiv.2104.08786
[11] Webson. A, Pavlick E., (2022). Do prompt-based models really understand the meaning of their prompts? In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. p. 2300–2344.
https://doi.org/10.48550/arXiv.2109.01247
[12] Maynez. J, Narayan. S, Bohnet. B, McDonald. R., (2020). On faithfulness and factuality in abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. p. 1906–1919.
https://doi.org/10.48550/arXiv.2005.00661
[13] Bubeck. S, Chandrasekaran. V, Eldan. R, Gehrke. J, Horvitz. E, Kamar E, et al., (2023). Sparks of artificial general intelligence: early experiments with GPT-4. ArXiv:2303.12712.
https://doi.org/10.48550/arXiv.2303.12712
[15] Jha, S., Jha, S.K., Lincoln, P., Bastian, N.D., Velasquez, A., and Neema, S., (2023). Dehallucinating large language models using formal methods guided iterative prompting, in: 2023 IEEE International Conference on Assured Autonomy (ICAA), IEEE. pp. 149-152. DOI:
10.1109/ICAA58325.2023.00029
[19] Wei. J, Wang X, Schuurmans. D, Bosma. M, Ichter. B, Xia F, et al., (2022). Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems. vol. 35; p. 24824–24837.
https://doi.org/10.48550/arXiv.2201.11903
[20] Lecler. A, Duron, L., and Soander, P., (2023). Revolutionizing radiologand with GPT-based models: current applications, future possibilities and limitations of ChatGPT, Diagn. Interv. Imaging, 104(6), pp. 269-274,
https://doi.org/10.1016/j.diii.2023.02.003.
[21] Epstein, R.H. and Dexter, F., (2023). Variabilitand in large language Models' responses to medical licensing and certification examinations. comment on How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment, JMIR Med. Educ., 9,
https://doi.org/10.2196/48305.
[25] Shieh, J., (2023). Best practices for prompt engineering with OpenAI API, OpenIA, [online]. Accessed: October 3rd of 2023. https://doi.org/10.15446/dyna.v90n230.111700
[27] Spasic, A.J., and Jankovic, D.S., (2023). Using ChatGPT standard prompt engineering techniques in lesson preparation: role, instructions and seed-word prompts, in: 2023 58th International Scientific Conference on Information, Communication and Energand Sandstems and Technologies, ICEST 2023 - Proceedings, pp. 47-50.
https://doi.org/10.1109/ICEST58410.2023.10187269.
[29] Zhang Z, Gao J, Dhaliwal RS, Jia-Jun. Li T. (2023). VISAR: a human-AI argumentative writing assistant with visual programming and rapid draft prototyping; ArXiv:2304.07810.
https://doi.org/10.48550/arXiv.2304.07810
[34] Wu S, Shen EM, Badrinath C, Ma J, Lakkaraju H., (2023). Analyzing chain-of-thought prompting in large language models via gradient-based feature Attributions; ArXiv:2307.13339.
https://doi.org/10.48550/arXiv.2307.13339
[35] Lewkowycz. A, Andreassen. A, Dohan D, Dyer E, Michalewski H, Ramasesh V, et al., (2022). Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems. 35:3843–3857.
https://doi.org/10.48550/arXiv.2206.14858
[38] Wang, Boshi, et al., (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. ArXiv:2212.10001 [Cs], Dec. arxiv.org/abs/2212.10001.
https://doi.org/10.48550/arXiv.2212.10001
[41] Prystawski. Ben, et al., (2023). Why Think Step by Step? Reasoning Emerges from the Locality of Experience. ArXiv.org,
https://doi.org/10.48550/ arXiv.2304.03843.
[42] Del M, Fishel M., (2023). True detective: a deep abductive reasoning benchmark undoable for GPT-3 and challenging for GPT-4. In: Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023);
https://doi.org/10.48550/arXiv.2212.10114
[46] Besta. M, Blach N, Kubicek A, Gerstenberger R, Gianinazzi L, Gajda J, et al., (2023). Graph of thoughts: solving elaborate problems with large language models; ArXiv:2308.09687.
https://doi.org/10.1609/aaai.v38i16.29720
[49] Logan. IV R, Balaˇzevi´c I, Wallace E, Petroni F, Singh S, Riedel S., (2022). Cutting down on prompts and parameters: simple few-shot learning with language models. In: Findings of the Association for Computational Linguistics: ACL 2022; p. 2824–2835.
https://doi.org/10.48550/arXiv.2106.13353
[51] Brown, Tom, et al., (2020). Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems, vol. 33, pp. 1877–901, proceedings.neurips.cc/paper_files/ paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html?utm_medium= email&utm_source=transaction.
https://doi.org/10.48550/arXiv.2005.14165
[52] Reynolds. L, McDonell K., (2021). Prompt programming for large language mod-els: beyond the few-shot paradigm. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems; p. 1–7.
https://doi.org/10.48550/arXiv.2102.07350
[53] Brown. TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al., (2020). language models Are Few-Shot Learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20;
https://doi.org/10.48550/arXiv.2005.14165
[55] Wang. Boshi, et al., (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. ArXiv:2212.10001 [Cs], Dec. arxiv.org/abs/2212.10001.
https://doi.org/10.48550/arXiv.2212.10001
[56] Wei. J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al., (2022). Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems. vol. 35; p. 24824–24837.
https://doi.org/10.48550/arXiv.2201.11903
[57] Wang X, Wei J, Schuurmans D, Le QV, Chi EH, Narang S, et al. (2023). Self-consistency improves chain of thought reasoning in language models. In: Eleventh International Conference on Learning Representations;
https://doi.org/10.48550/arXiv.2203.11171
[58] Bender EM, Gebru T, McMillan-Major A, Shmitchell S., (2021). On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency; p. 610–623. https://doi.org/10.1145/3442188.3445922
[61] Zhou D, Sch¨arli N, Hou L, Wei J, Scales N, Wang X, et al.,. (2023). Least-to-most prompting enables complex reasoning in large language models. In: Eleventh International Conference on Learning Representations;
https://doi.org/10.48550/arXiv.2205.10625
[62] Liu. J, Liu A, Lu X, Welleck S, West P, Le Bras R, & et al., (2022). Generated knowledge prompting for commonsense reasoning. In: Proceedings of the 60th Annual Meet-ing of the Association for Computational Linguistics (Volume 1: Long Papers); p. 3154–3169.
https://doi.org/10.48550/arXiv.2110.08387
[63] Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The Prompt Report: A Systematic Survey of Prompting Techniques. arXiv preprint arXiv:2406.06608.
https://arxiv.org/abs/2406.06608
[64] Deng M, Wang J, Hsieh C-P, Wang Y, Guo H, Shu T, & et.al., (2022). RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empiri-cal Methods in Natural Language Processing, pages 3369–3391, Abu Dhabi, United Arab Emirates. As-sociation for Computational Linguistics.
https://doi.org/ 10.48550/ arXiv. 2205.12548
[65] Xu H, Chen Y, Du Y, Shao N, Wang Y, Li H, and Yang Z., (2022). Gps: Genetic prompt search for efficient few-shot learning. 2022. arXiv preprint arXiv:2210.17041.
https://doi.org/10.48550/arXiv.2210.17041
[66] Wan. X, Sun R, Nakhost. H, Dai H, Eisenschlos. J, Arik. S, and Pfister. T., (2023b). Universal self-adaptive prompting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7437–7462, Singapore. Association for Computational Linguistics.
https://doi.org/10.48550/arXiv.2305.14926
[68] Holtzman. A, Buys J, Du L, Forbes. M, Choi Y., (2020). The curious case of neural text degeneration. In: International Conference on Learning Representations;
https://doi.org/10.48550/arXiv.1904.09751
[69] Raand P.P., and Majumder, P., (2023). Assessing the Accuracand of responses band the language model ChatGPT to questions regarding bariatric surgerand: a critical appraisal, Obes. Surg., 33(8), pp. 2588-2589, https://doi.org/10.1007/s11695-023-06664-6.
[70] Gupta, R., Herzog, I., Weisberger, J., Chao, J., Chaiandasate, K., and Lee, E.S., (2023). Utilization of ChatGPT for plastic surgerand research: friend or foe, J. Plast. Reconstr. Aesthet. Surg., 80, pp. 145-147, https://doi.org/10.1016/j.bjps.2023.03.004.
[71] Deiana, G., Dettori, M., Arghittu, A., Azara, A., Gabutti, G., and Castiglia, P., (2023). Artificial intelligence and public health: evaluating ChatGPT responses to vaccination mandths and misconceptions, Vaccines, 11(7), art. 11071217, https://doi.org/10.3390/vaccines11071217.
[74] Papineni K, Roukos S, Ward T, Zhu WJ. BLEU:, (2002). a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics; p. 311–318.
https://doi.org/10.3115/1073083.10731
[75] Lin, C.Y., (2004) Rouge: A Package for Automatic Evaluation of Summaries. Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL, Barcelona, 25 July 2004.; p. 74–8.
[76] Banerjee. S, Lavie. A., (2005). METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Work-shop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization; p. 65–72.
[77] Zhang. T, Kishore V, Wu F, Weinberger. KQ, Artzi Y., (2020). BERTScore: evalu-ating text generation with BERT. In: International Conference on Learning Representations;
https://doi.org/10.48550/arXiv.1904.09675
[78] Stent. A, Marge. M, Singhai. M., (2005). Evaluating evaluation methods for generation in the presence of variation. In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer; p. 341–351. DOI:
10.1007/978-3-540-30586-6_38
[79] Deng M, Wang. J, Hsieh CP, Wang Y, Guo. H, Shu. T, et al., (2022). RLPrompt: optimiz-ing discrete text prompts with reinforcement learning. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; p. 3369–3391.
https://doi.org/ 10.48550/arXiv.2205.12548
[80] Zhou. Y, Muresanu AI, Han Z, Paster. K, Pitis S, Chan H, et al. (2022). Large language models are human-level prompt engineers. In: Eleventh International Conference on Learning Representations;
https://doi.org/10.48550/arXiv.2211.01910
[82] Maynez J, Narayan. S, Bohnet B, McDonald. R., (2020). On faithfulness and factuality in abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; p. 1.
https://doi.org/10.48550/arXiv.2005.00661
[84] Ji. Z, Lee. N, Frieske. R, Yu T, Su D, Xu Y, et al., (2023). Survey of hallucination in natural language generation. ACM Computing Surveys. 55(12):1–38.
https://doi.org/10.1145/3571730
[86] Lazaridou. A, Gribovskaya. E, Stokowiec. W, Grigorev N., (2022). Internet-augmented lan-guage models through few-shot prompting for open-domain question answering; ArXiv:2203.05115.
https://doi.org/10.48550/arXiv.2203.05115
[88] Shehzaad. Dhuliawala, Mojtaba. Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason Weston., (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv preprint arXiv:2309.11495
https://doi.org/ 10.48550/ arXiv.2309.11495
[89] Ziwei Ji, Tiezheng Yu, Yan Xu., (2023). Nayeon Lee, Etsuko Ishii, and Pascale Fung. 2023. Towards Mitigating Hallucination in Large Language Models via Self-Reflection. EMNLP Findings
https://doi.org/10.48550/arXiv.2310.06271
[90] Takeshi. Kojima, Shixiang Shane. Gu., (2023). Machel Reid Google Research, Yutaka Matsuo, and Yusuke Iwasawa. 2023. Large Language Models are Zero-Shot Reasoners. URL https://arxiv. org/abs/2205.11916
https://doi.org/10.48550/arXiv.2205.11916
[91] Deren. Lei, Yaxi. Li, Mingyu Wang, Vincent Yun, Emily Ching, Eslam Kamal, et al., (2023). Chain of natural language inference for reducing large language model ungrounded hallucinations. arXiv preprint arXiv:2310.03951
https://doi.org/10.48550/arXiv.2310.03951
[93] Zhenhailong. Wang, Shaoguang. Mao, Wenshan. Wu, Tao Ge, Furu. Wei, and Heng. Ji., (2023). Unleashing cognitive synergy in large language models: A task-solving agent through multi-persona selfcollaboration. arXiv preprint arXiv:2307.05300 1, 2. 3.
https://doi.org/10.48550/arXiv.2307.05300
[95] Lewis. P, Perez. E, Piktus. A, Petroni F, Karpukhin. V, Goyal. N, et al., (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems.33:9459–9474.
https://doi.org/10.48550/arXiv.2005.11401
[97] Lewis. M, Liu Y, Goyal. N, Ghazvininejad. M, Mohamed A, Levy O, et al., (2020). BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; p. 7871–7880.
https://doi.org/10.48550/arXiv.1910.13461
[98] Roller. S, Dinan. E, Goyal. N, Ju D, Williamson. M, Liu Y, et al., (2021). Recipes for building an open-domain chatbot. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume; p. 300–325.
https://doi.org/10.1609/aaai.v38i16.29720
[99] Dhuliawala. S, Komeili M, Xu J, Raileanu. R, Li X, Celikyilmaz. A, et al., (2023). Chain-of-verification reduces hallucination in large language models; ArXiv:2309.11495.
https://doi.org/10.48550/arXiv.2309.11495
[100] Sahoo. P, Kumar Singh. A, Saha S, Jain V, Mondal. S, and Chadha. A., (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927,
https://doi.org/10.48550/arXiv.2402.07927
[102] Besta. M, Blach. N, Kubicek. A, Gerstenberger. R, Podstawski. M, Gianinazzi. L, Gajda J, Lehmann. T, Niewiadomski. H, Nyczyk. P, Hoefler. T., (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models.
htor@inf.ethz.ch https://doi.org/10.1609/aaai.v38i16.29720
[103] Petroni. F, Rocktäschel. T, Riedel, S. Lewis, P, Bakhtin, A, Wu, Yuxiang, & Miller, A., (2019). Language models as knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 2463–2473.
https://doi.org/10.18653/v1/D19-1250
[104] Schick. T, & Schütze. H., (2021). It’s not just size that matters: small language models are also few-shot learn-ers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2339–2352. https://doi.org/10.18653/v1/ 2021.naacl-main.185
[105] Gao. T, Fisch. A, and Chen. D., (2021). Making pre-trained language models better few-shot learners. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’21).
https://doi.org/10.48550/arXiv.2012.15723
[106] Hambardzumyan. K, Khachatrian. H, and May. J., (2021). WARP: Word-level adversarial reprogramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 4921–4933.
https://doi.org/ 10.18653/v1/2021.acl-long.381
[107] Lester. B, Al-Rfou. R, and Constant. N., (2021). The power of scale for parameter-efficient prompt tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 3045–3059.
https://doi.org/10.18653/ v1/2021.emnlp-main.243
[108] Schick, T. Schmid, H. & Schütze, H., (2020). Automatically identifying words that can serve as la-bels for few-shot text classification. In Proceedings of the 28th International Conference on Computational Linguistics (COLING’20), Donia Scott, Núria Bel, and Chengqing Zong (Eds.). International Committee on Computational Lin-guistics, 5569–5578.
https://doi.org/ 10.18653/ v1/2020.coling-main.488
[109] Schick, T. & Schütze, H. (2021). Exploiting Cloze-questions for few-shot text classification and natural lan-guage inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL’21), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty (Eds.). Association for Compu-tational Linguistics, 255–269.
https://doi.org/10.18653/v1/2021.eacl-main.20
[111] Dev, C, Biyani, N, Suthar, N, Kumar, P, Agarwal, P., (2021) .Structured Prediction in NLP - A survey. arXiv:2110.02057 [cs.CL] (or
arXiv:2110.02057v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2110.02057
[112] Shin. R, H. Lin. C, Thomson. S, Chen. C, Roy. S, Antonios. Platanios E, & et.al., (2021). Constrained language models yield few-shot se-mantic parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 7699–7715.
https://doi.org/10.18653/v1/2021.emnlp-main.608
[113] Radford. A, Wu J, R. Child, Luan. D, Amodei. D, and Sutskever. I., (2019). Language models are unsu-pervised multitask learners.
[114] Brown. T, Mann. B, Ryder. N, Subbiah. M, D. Kaplan. J, Dhariwal. P, & et.al., (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901.
https://doi.org/10.48550/arXiv.2005.14165
[115] Schick. T. and Schütze. H., (2021). Few-shot text generation with natural language instructions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 390–402.
https://doi.org/ 10.18653/ v1/2021.emnlp-main.32
[116] Lisa. Li X. and Liang. P., (2021). Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP’21), Volume 1: Long Papers, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
[118] Dou. Z-Y, Liu. P, Hayashi. H, Jiang. H, and Neubig. G., (2021). GSum: A general framework for guided neural abstractive summarization. Association for Computational Linguistics, Online, 4830–4842
. arXiv:2010.08014 [cs.CL]. https://doi.org/10.48550/arXiv.2010.08014.
[120] Tsimpoukelli. M, Menick. J, Cabi. S, S. M. Ali Eslami, Vinyals. O, and Hill. F., (2021). Multimodal few-shot learning with frozen language models. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems (NeurIPS’21), Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 200–212.
https://doi.org/10.48550/arXiv.2106.13884
[122] Rajpurkar. P, Zhang. J, Lopyrev. K, and Liang. P., (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Asso-ciation for Computational Linguistics, 2383–2392. https://doi.org/10.18653/v1/D16-1264
[123] Lai. G, Xie. Q, Liu. H, Yang. Y, and Hovy. E., (2017). RACE: Large-scale reading comprehen-sion dataset from examinations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 785–794. https://doi.org/10.18653/v1/D17-1082
[124] Khashabi. D, Min. S, Khot. T, Sabharwal. A, Tafjord. O, Clark. P, and Hajishirzi. H., (2020). UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computa-tional Linguistics: EMNLP. Association for Computational Linguistics, 1896–1907. https://doi.org/10.18653/v1/ 2020.findings-emnlp.171.
[125] Jiang. Z, Araki. J, Ding. H, and Neubig. G., (2021). How can we know when language models know? On the calibration of language models for question answering. Trans. Assoc. Comput. Ling. 9 (09 2021), 962–977.
https://doi.org/10.1162/tacl_a_00407
[126] Schick. T., and Schütze. H., (2021). Generating datasets with pretrained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Ed) DOI:
10.5282/ubm/epub.92195
[128] Ben-David. E, Oved. N, and Reichart. R., (2022). PADA: Example-based prompt learning for on-the-fly adapta-tion to unseen domains. Trans. Assoc. Comput. Linguist. 10 (4 2022), 414–433. https://doi.org/10.1162/tacl_a_00468
Send comment about this article