View Yahoo! Web Directory Topics (public)























- Summary
Contains parsed webpages along with their topics extracted from Yahoo! web directory
- License
- unknown
- Dependencies
- Tags
- bag-of-words Classification multi-class text web-pages Yahoo!
- Attribute Types
- Download
-
# Instances: 2212 / # Attributes: 10630
HDF5 (3.6 MB) XML CSV ARFF LibSVM Matlab OctaveFiles are converted on demand and the process can take up to a minute. Please wait until download begins.
- Original Data Format
- libsvm
- Name
- yahoo-web-directory-topics
- Version mldata
- 0
- Comment
LibSVM
- Names
- Data (first 10 data points)
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ... ... ... ... ... ... ... ... ... ... ...
- Description
Extracted from Yahoo! web directory, this data set contains parsed webpages along with their topics. Each line is the bag of words representation of a web page whose label is it is first level topic in the Yahoo directory hierarchy. The topics ids correspond to the following semantic topic: 1 Arts 2 Business and Economy 3 Education 4 Entertainment
The format used is libsvm format with zeros being omitted. For more info, please don't hesitate to contact me on : jean . faddoul (at) gmail . com , http://www.grappa.univ-lille3.fr/~faddoul/
- URLs
- http://dir.yahoo.com/
- Publications
- Data Source
- Yahoo! web Directory
- Measurement Details
- Usage Scenario
- revision 1
- by jeanbaptiste on 2012-03-05 14:30
- revision 2
- by jeanbaptiste on 2012-03-13 15:16
- revision 3
- by jeanbaptiste on 2012-03-13 15:16
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
This item was downloaded 12977 times and viewed 2232 times.
Disclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.
