Submission Guideline for NTCIR-5 CLIR Task
how to submit search results
1. Files to Be Submitted
All participants have to submit (a) files of document list by each run and (b) a file of system descriptions. Please use XML-style tags for describing your system according to instruction in the Section 5. An example is as follows.
(a) example of document list (search results): See Section 6 for details.
030 0 cts_cec_19991118596 1 4238 LIPS-C-CJE-T-01
030 0 cts_cec_19991118596 2 3211 LIPS-C-CJE-T-01
030 0 cts_cec_19991118596 1000 1116 LIPS-C-CJE-T-01
(b) example of system description: See Section 5 for details.
<TRANS>dictionary-based query translation</TRANS>
<QEXP>pre- and post-translation expansion by Rocchio</QEXP>
<CORPUS>using NTCIR-1 Japanese document collections for expansion</CORPUS>
<SPTECH>searching translations of unknown terms automatically from Web pages</SPTECH>
2. Type of Runs
Mandatory Runs: T-run and D-run
Each participant must submit two types of run for each combination of topic language and document language(s);
The purpose of asking participants to submit these mandatory runs is to make research findings clear by comparing systems or methods under a unified condition.
Recommended Runs: DN-run
Also, the task organizers would like to recommend strongly DN run, which is run using <DESC> and <NARR> fields are used.
Other any combinations of fields are allowed to submit as optional runs according to each participant's research interests, e.g. TDN-run, DC-run, TDNC-run and so on.
3. Number of Runs
Each participant can submit up to 5 runs in total for each language pair
regardless of the type of run, and participants are allowed to include
two T runs in maximum and also two D-runs in maximum into the 5 runs. The
language pair means the combination of topic language and document language(s).
Language combination -> Topic: C and Docs: CJE (C->CJE)
Submission -> two T-runs, a D-run, a DN-run and a TDNC run (5 runs in total).
4. Identification and Priority of Runs
Each run has to be associated with a RunID. RunID is an identity for each run. The rule of format for RunID is as follows.
The 'pp' is two digits used to represent the priority of the run. It will be used as a parameter for pooling (see below). The participants have to decide the priority for each submitted run in the basis of each language pair. "01" means the high priority. For example, a participating group, LIPS, submits 3 runs for C-->CJE. The first is a T run, the second is a D run and the third is a DN run. Therefore, the Run ID for each run is LIPS-C-CJE-T-01, LIPS-C-CJE-D-02, and LIPS-C-CJE-DN-03, respectively. Or, if the group uses different ranking techniques in T run for C --> CJE, the RunID for each run has to be LIPS-C-CJE-T-01, LIPS-C-CJE-T-02, and LIPS-C-CJE-D-03.
Note: Top X documents in each of the submitted runs will be collected and put into the document pool. Only documnets in the pool will be judged by human assessors. If the number of the submitted runs are too large, the runs to be put in the pool may be selected based on the priority that you assign to each of the runs.
5. System Description
5.1 Descriptive Information
In addition to search results, every participating group has to give us a concise description of each run. This description should contain the following information.
<INDEXUNIT>: Unit of indexing, e.g., character, bi-character, bi-word, phrase, etc.
<INDEXTECH>: Techniques for indexing, e.g., morphology, stemming, POS, etc
<INDEXSTRUC>: inverted file, signature file, PAT, etc.
<QUERYUNIT>:character, word, phrase, etc.
<MODEL>:vector space model, probabilistic model (Okapi, INQUERY, logistic regression), etc.
<RANK>:ranking factor for measuring each term, e.g., tf, tf/idf, mutual information, word association, document length, etc.
<TRANS>: translation technique used to deal with cross-lingual information retrieval, e.g., dictionary-based, corpus-based, MT, etc. The detailed information are welcome, e.g., select-all, select-top-N, translation disambiguation, etc.
<QEXP>: techniques used to expand query or no query expansion.
<CORPUS>: information about special corpus used to translation, expansion,etc.
<PIVOT>: language used for pivot approach, e.g., English.
<SPTECH>: special techniques for improving performance of CLIR runs.
<COMMENT>: any other comments.
5.2 Root tags
Please pack system descriptions for all runs into a single file using two root tags, <TECHDESC> and <RUN>, as follows;
...description of the run1...
...description of the run2...
Please copy and use the template for writing your description.
5.4 File name and format
Please store the system descriptions into a single plain-text file (.txt) with your group name as it's file name, e.g., LIPS.txt.
6. Document List
Since the TREC's evaluation program is used to carry out the relevance assessment, each participating group has to submit its retrieval result in the designated format. The result file is a list of tuples in the following form:
001 0 cts_cec_19991118596 1 9999 LIPS-C-CJE-T-01
001 0 cts_cec_19991118596 2 9998 LIPS-C-CJE-T-01
001 0 cts_cec_19991118596 1000 1116 LIPS-C-CJE-T-01
002 0 cts_cec_19991118596 1 9997 LIPS-C-CJE-T-01
002 0 cts_cec_19991118596 2 9994 LIPS-C-CJE-T-01
050 0 cts_cec_19991118596 1000 1994 LIPS-C-CJE-T-01
The search result file which will be sent should follow the format below:
Topic-ID Dummy-field Document-ID Rank Similarity-value Run-ID
6.2 File name and format
Please store the document list for each run into a single plain-text file with RunID as it's file name, e.g., LIPS-C-CJE-T-01 (with no file identifier).
7. How to Submit Files
Please send your search results to us according to the following procedure by the deadline.
Please attach your group's ID to the head of the file name (e.g., NII.list.txt).NII-J-C-T-01 NII-J-C-D-02 .... NII-C-J-DN-03 NII.txt
June 01 2005 23:59 Japanese Time
(except runs searching English document sets)
We would like to remind you that you must return the document data if you do NOT submit any results.
9. Contact Information
If you have any questions, please contact with task organizers: ****@nii.ac.jp