View DMOZ Web Directory Topics (public)























- Summary
Contains parsed webpages along with their topics extracted from DMOZ web directory
- License
- unknown
- Dependencies
- Tags
- bag-of-words Classification DMOZ libsvm multi-class text web-pages
- Attribute Types
- Download
-
# Instances: 2658 / # Attributes: 10630
HDF5 (4.1 MB) XML CSV ARFF LibSVM Matlab OctaveFiles are converted on demand and the process can take up to a minute. Please wait until download begins.
You can edit this item to add more meta information and make use of the site's premium features.
- Original Data Format
- libsvm
- Name
- dmoz-web-directory-topics
- Version mldata
- 0
- Comment
LibSVM
- Names
- Data (first 10 data points)
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ... ... ... ... ... ... ... ... ... ... ...
- Description
Extracted from DMOZ (Open Directory Project) web directory. ¨The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors.¨ This data set contains parsed webpages along with their topics. Each line is the bag of words representation of a web page whose label is it is first level topic in the Yahoo directory hierarchy. The topics ids correspond to the following semantic topic: 1 Arts 2 Games 3 Kids and Teens 4 Shopping 5 Society The format used is libsvm format with zeros being omitted.
- URLs
- http://www.dmoz.org/
- Publications
- Data Source
- Measurement Details
- Usage Scenario
- revision 1
- by jeanbaptiste on 2012-03-13 15:10
- revision 2
- by jeanbaptiste on 2012-03-13 15:11
- revision 3
- by jeanbaptiste on 2012-03-13 15:16
- revision 4
- by jeanbaptiste on 2012-03-13 15:17
- revision 5
- by jeanbaptiste on 2012-03-29 16:45
- revision 6
- by jeanbaptiste on 2012-03-29 16:47
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
This item was downloaded 11205 times and viewed 6192 times.
Disclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.
