July/07/2005

[NTCIR-5 CLIR Task Home]


Xinhua 2000-2001


The Xinhua data set that we distribute, consists of 47 files as follows. You should use only the subset of 23 files from "/xie2000/xie200001" to "/xie2001/xie200111" for your seach runs to be submitted to the organizers in the NTCIR-5 Workshop. It should be noted that xie200112 file is missing.

List of 47 files
*The files for evaluation are indicated in red color.

<for trainning>
ntcir_eval/xie1998/xie199801
ntcir_eval/xie1998/xie199802
ntcir_eval/xie1998/xie199803
ntcir_eval/xie1998/xie199804
ntcir_eval/xie1998/xie199805
ntcir_eval/xie1998/xie199806
ntcir_eval/xie1998/xie199807
ntcir_eval/xie1998/xie199808
ntcir_eval/xie1998/xie199809
ntcir_eval/xie1998/xie199810
ntcir_eval/xie1998/xie199811
ntcir_eval/xie1998/xie199812
ntcir_eval/xie1999/xie199901
ntcir_eval/xie1999/xie199902
ntcir_eval/xie1999/xie199903
ntcir_eval/xie1999/xie199904
ntcir_eval/xie1999/xie199905
ntcir_eval/xie1999/xie199906
ntcir_eval/xie1999/xie199907
ntcir_eval/xie1999/xie199908
ntcir_eval/xie1999/xie199909
ntcir_eval/xie1999/xie199910
ntcir_eval/xie1999/xie199811
ntcir_eval/xie1999/xie199912

<for evaluation>
ntcir_eval/xie2000/xie200001
ntcir_eval/xie2000/xie200002
ntcir_eval/xie2000/xie200003
ntcir_eval/xie2000/xie200004
ntcir_eval/xie2000/xie200005
ntcir_eval/xie2000/xie200006
ntcir_eval/xie2000/xie200007
ntcir_eval/xie2000/xie200008
ntcir_eval/xie2000/xie200009
ntcir_eval/xie2000/xie200010
ntcir_eval/xie2000/xie200011
ntcir_eval/xie2000/xie200012
ntcir_eval/xie2001/xie200101
ntcir_eval/xie2001/xie200102
ntcir_eval/xie2001/xie200103
ntcir_eval/xie2001/xie200104
ntcir_eval/xie2001/xie200105
ntcir_eval/xie2001/xie200106
ntcir_eval/xie2001/xie200107
ntcir_eval/xie2001/xie200108
ntcir_eval/xie2001/xie200109
ntcir_eval/xie2001/xie200110
ntcir_eval/xie2001/xie200111

The numbers of records by year in the Xinhua data set are as follows.
1998: 103,470
1999: 104,698
2000: 107,956
2001: 90,668