Labeling Guideline

레이블링에 참여해주셔서 감사합니다!

여러분은 동일한 조사 대상자에 대한 서로 다른 두 AI 심문관 A와 B의 인터뷰 대화 기록을 보고, AI 심문관의 질문 능력을 평가하게 됩니다.

여러분이 해주셔야 할 태스크는 아래와 같이 두 가지입니다.

서로 다른 두 심문관 (A , B) 중, 어떤 심문관이 더 나은 질문을 하는지 판단하세요.
각 심문관의 자질을 5점 척도로 평가해 주세요.

평가 기준

본 인터뷰는 인터뷰이가 본인의 정보, 기억, 경험을 일관되게 답변하고, 또 해당 답변들이 외부 세계와도 모순이 없는지를 확인하고자 하는 과정입니다.

따라서 본 인터뷰에서 좋은 질문이란 '인터뷰이와 관련된 최대한 구체적이고 검증 가능한 답변들을 얻을 수 있는 질문'입니다.

좀 더 구체화를 하면 다음과 같습니다.

좋은 질문의 기준

한 주제에 대해 답변이 충분히 구체화가 될 때까지 질문했는가
- 만약 질문에 대한 답변을 얻지 못해서 재질문하고자 할 경우, 질문을 다른 표현으로 바꾸어서(paraphrase해서) 물어봐야 합니다.
- 단, 그럼에도 불구하고 인터뷰이가 관련 질문에 대한 답변을 계속 거부할 경우 다른 주제로 넘어갈 수 있습니다.
검증 가능한 정보들을 뽑아낼 수 있는 질문 위주로 했는가 (= 모순을 판단할 수 있거나, 외부 검색을 통해 검증할 수 있을 만한 질문인가)

(e.g., "날짜, 주소, 소속 ID, 기관 이름, 이메일, 다니는 회사 상사 등 관계자 이름" 관련된 질문들)
질문이 인터뷰이에 특화된 질문인가 (즉 인터뷰이의 구체적인 경험, 답변과 연관성이 높은 질문인가)
질문들 간의 상호 연관성이 높은가
이전 대화에서 모순이나 의문점이 발견되었을 경우, 발생한 모순과 관련된 질문을 많이 했는가

질문을 잘하지 못한 경우

반대로 질문을 못한 케이스는 다음과 같습니다.

하나의 주제에 대해 충분히 구체화가 되지 않았는데 바로 완전히 다른 주제로 넘어가버린 경우
동일한 질문을 다른 표현으로 바꾸지 않고(paraphrase 하지 않고) 그대로 반복할 경우
모순 여부나 사실 관계를 검증하기 어려운 추상적인 질문을 한 경우

(e.g., "너의 취미는 뭐야?", "너의 인생에서 가장 중요한 가치는 뭐야?")
인터뷰이 본인의 정보 및 경험과는 연관이 낮고 외부 지식을 이용해 답변해야 하는 질문을 한 경우

(e.g., "나는 구글에 다녀." → "구글 설립 연도는 언제야?")
- 예외) "나는 구글 창립자야." → "구글의 설립 연도는 언제야?" 처럼 인터뷰이가 직접 참여한 이벤트/사건/경험과 밀접한 질문은 허용함. 따라서 이전 질문과 답변들을 함께 고려해서 평가해야 함.
외부 지식을 가져와 확인하는 질문을 한 경우
- 질문 형식 : "Would you confirm that..." (e.g., "Would you confirm that the 'KAIST' you metioned is the research-oriented science and engineering university in South Korea?")
- 외부 지식을 확인하는 과정은 따로 존재하므로, 메인 질문 과정에서는 인터뷰이와 관련된 질문만을 해야 합니다.
질문들 사이의 관련성이 낮아 상호 모순을 판단하기 어려운 경우
이전 대화에서 모순이 발견되었음에도 연관성 없는 다른 질문으로 넘어가버린 경우

주의 사항

심문관을 평가할 때, 인터뷰이의 답변은 고려하지 않고 심문관의 질문 능력만을 평가합니다. 답변이 아닌 질문의 양상과 퀄리티에 집중해주세요.
개별 질문뿐만 아니라 전체적인 질문 전략을 고려해주세요.

참고 사항

Chrome의 번역 기능을 사용해서 한글로 번역 후 평가하셔도 됩니다!
레이블링을 하다가 기준이 기억이 안 나거나 헷갈리시면, 화면의 좌측 하단에 위치한 GUIDELINES를 클릭하여 내용을 확인할 수 있습니다

📧 minskim010203@gmail.com, imsujeong2190@gmail.com

Labeling Guideline

Thank you for participating in this labeling project!

You will review interview transcripts of two different AI interrogators (A and B) interacting with the same interviewee. Your task is to evaluate the questioning capabilities of these AI interrogators.

There are two main tasks to complete:

Comparison: Determine which of the two interrogators (A or B) asks better questions.
Rating: Evaluate the quality of each interrogator on a 5-point scale.

Evaluation Criteria

The purpose of this interview is to ensure the interviewee provides consistent accounts of their data, memories, and experiences, and that these accounts do not contradict external reality.

Consequently, effective questioning should focus on extracting highly detailed and verifiable information from the interviewee.

Criteria for Good Questions

Depth & Persistence: Did the interrogator ask follow-up questions until the topic was sufficiently detailed?
If the interrogator needs to ask again because they didn’t get a clear answer, they should paraphrase the question.
Exception: If the interviewee repeatedly refuses to answer despite paraphrasing, the interrogator may move to a different topic.
Verifiability: Did the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified through external search).
Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, or names of relevant parties like supervisors.
Personalization: Are the questions tailored to the interviewee? (i.e., highly relevant to the interviewee’s specific experiences and previous answers).
Cohesion: Is there a high degree of interconnection between the questions?
Addressing Contradictions: If a contradiction or point of doubt was found in previous dialogue, did the interrogator focus on questions related to that contradiction?

Criteria for Poor Questions

Conversely, the following cases indicate poor questioning performance:

Premature Topic Shifts: Moving to a completely different topic before the current subject has been sufficiently detailed.
Repetition without Paraphrasing: Repeating the exact same question without changing the phrasing.
Abstract/Unverifiable Questions: Asking abstract questions where it is difficult to judge contradictions or verify facts.
Examples: "What are your hobbies?", "What is the most important value in your life?"
External Knowledge Over Personal Experience: Asking questions that require external knowledge rather than the interviewee’s own information/experience.
Example: "I work at Google." → "What year was Google founded?"
Exception: Questions closely related to events/experiences the interviewee directly participated in are allowed. (e.g., "I am the founder of Google." → "What year was Google founded?") You must evaluate this based on the context of the previous dialogue.
Fact-Checking External Knowledge: Using the main questioning phase to verify external facts rather than focusing on the interviewee.
- Question Format : "Would you confirm that ..." (e.g., "Would you confirm that the 'KAIST' you metioned is the research-oriented science and engineering university in South Korea?")
- There is a separate process for external fact-checking
Low Correlation: Questions that lack relevance to each other, making it difficult to identify mutual contradictions.
Ignoring Inconsistencies: Moving to an unrelated question even though a contradiction was detected in the previous conversation.

Important Notes

When evaluating the interrogator, do not judge the interviewee's answers. Focus solely on the pattern and quality of the interrogator's questions.
Consider the overall questioning strategy as a whole, rather than just looking at individual questions in isolation.

Reference

You may use the Chrome translation tool to view and evaluate the content in your language of choice.
If you forget the labeling criteria or get confused, you can click the GUIDELINES button in the bottom-left corner of the screen to review them.

📧 minskim010203@gmail.com, imsujeong2190@gmail.com