HipHop Genealogy: Difference between revisions
(3 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via <tt>categorize.py</tt> to ensure correct normalization. | A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via <tt>categorize.py</tt> to ensure correct normalization. | ||
* compressed | * compressed | ||
Loosely organized directory of mp3/m4a/etc. files for the base data set. | Loosely organized directory of mp3/m4a/etc. files for the base data set. New data set examples go in here, to be sorted out by <tt>categorized.py</tt>. | ||
* genres.json | * genres.json | ||
List of each possible genre in the dataset. Handwritten and used by <tt>categorize.py</tt> for manual genre entry. | List of each possible genre in the dataset. Handwritten and used by <tt>categorize.py</tt> for manual genre entry. | ||
Line 15: | Line 15: | ||
== Utility Tools == | == Utility Tools == | ||
* decompress.py | * decompress.py | ||
Convert files in <tt>compressed/</tt> to WAV format, and place them in <tt>wav/</tt>. | |||
* build_artists.py | * build_artists.py | ||
Add any new artists in <tt>meta.json</tt> to <tt>artists.json</tt> (normally not necessary as <tt>categorize.py</tt> should do this automatically). | |||
* categorize.py | * categorize.py | ||
* | Search for new files in <tt>compressed/<tt> and request genre and artist information. Stores this all in <tt>meta.json</tt>. | ||
* export_matlab.py | |||
Export <tt>meta.json</tt> data into a format convenient for use in Matlab. Write filepaths to <tt>files.dat</tt> and genre + artist info to <tt>meta.dat</tt>. Each row in these files is one training example. Column 1 of <tt>meta.dat</tt> is the genre (an index into the list of genres in <tt>genres.json</tt>) and the subsequent columns indicate the presence of absence of a particular artist on that song (where column N is the N+1-th artist in <tt>artists.json</tt>). |
Latest revision as of 11:13, 10 November 2013
In /usr/ccrma/media/databases/hiphop-gene/ are the following files:
Data Files
- artists.json
A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via categorize.py to ensure correct normalization.
- compressed
Loosely organized directory of mp3/m4a/etc. files for the base data set. New data set examples go in here, to be sorted out by categorized.py.
- genres.json
List of each possible genre in the dataset. Handwritten and used by categorize.py for manual genre entry.
- meta.json
The main catalogue of metadata associated with each WAV file. Currently includes genre and artist(s) info, in addition to file paths of compressed/WAV versions of the audio data.
- wav
Directory of uncompressed audio data files. Automatically populated by decompress.py
Utility Tools
- decompress.py
Convert files in compressed/ to WAV format, and place them in wav/.
- build_artists.py
Add any new artists in meta.json to artists.json (normally not necessary as categorize.py should do this automatically).
- categorize.py
Search for new files in compressed/ and request genre and artist information. Stores this all in meta.json.
- export_matlab.py
Export meta.json data into a format convenient for use in Matlab. Write filepaths to files.dat and genre + artist info to meta.dat. Each row in these files is one training example. Column 1 of meta.dat is the genre (an index into the list of genres in genres.json) and the subsequent columns indicate the presence of absence of a particular artist on that song (where column N is the N+1-th artist in artists.json).