Hi all
I'm trying to rip MIDI from Arirang MIDI Karaoke. Until now, I wasn't able to extract MIDI yet ![]() I'm working with Arirang newest volume 38. You can easily get it by google with keyword "mediafire" ![]() The storage file, as we knew, MULTAK.DAT, and additional storage file MULTAK.DA1. My analysis for MULTAK.dat First 336 byte is the header ![]() You can find the number of pointers in TOC at offset 334, two-bytes-integer. In the picture, 22422 pointers, 22422 songs. You can also find these numbers in header 0 1000 2000 3000 4000 5000 6000 7000 7939 8939 9939 10939 11939 12080 13080 14080 15080 15081 16081 17081 18081 19081 19459 20459 21421 22421 at bold numbers positions in TOC, we'll get FF 00 FF FF, the NULL pointer, I guess. The TOC begin at offset 3360, As I said before, at position 0, we have NULL pointer FF 00 FF FF, you can also see it at positions 7939, 12080... in TOC. ![]() Except the NULL pointers, I found the way to decode other pointers, each pointer is 4 bytes: ![]() Now I'll call b0 for 1st byte, b1 for 2nd byte and so on... In picture, we have b0 = 0x49, b1 = 0x31, b2 = 0x2b and b3 = 0x09. b0, b1 and b2 are used to calculate pointer value: N = (b0*60+b1)*75+b2. And b3 is some flags, 4 higher bits will show which storage file(MULTAK.DAT or MULTAK.DA1) the song data located in. It can be 0 or 1. If it's 0, the song data is stored in MULTAK.DAT, at offset N * 2048 + 65536. If it's 1, the song data can be found at MULTAK.DA1, at offset N * 2048. Examples: 49 31 2b 09 N = (0x49 * 60 + 0x31) * 75 + 0x2b = 332218 4 higher bits of b3 is 0, then the song data is at offset 332218 * 2048 + 65536 = 680448000 of MULTAK.DAT 00 03 17 10 N = (0x00 *60 + 0x03) * 75 + 0x17 = 248 4 higher bits of b3 is 1, then the song data is at offset 248 * 2048 = 507904 of MULTAK.DA1 That's all about the header and the pointers in TOC as far as I know. The pointers, point to songs data (dĩ nhiên). Let's talk about song data. I found that there are 2 ways a karaoke song stored: Simple case: simple.raw. The songs is just lyric and MIDI data. ![]() Complex case: complex.raw. Just like simple case but attached with some MP3 content (voices). ![]() It's easy to realize that they use first 2 bytes to classify which is simple songs and which is complex songs. The first 2 bytes of a simple songs are 00 00, and follow by "OK" ![]() The first 2 bytes of a complex songs are FF FF, and follow by a two-byte-integer, the number of MP3 contents (I'll call that number is "M") attached with that song. Next 4 bytes may be some flags. Next, M blocks, each block 12 bytes, specify location of MP3 contents. And the rest just like simple case. Structure 12 byte block: 2 byte: small integer, unknown, starting of something. 2 byte: small integer, unknown, ending of something, may be related to timing, when will the MP3 content start and stop while the song are playing. 4 byte: integer, begin position. 4 byte: integer, end position. As you can see in the complex.raw I uploaded, here are the begin and end positions of 8 records, and by subtraction we can get the size also: 0 96128 96128 96128 240256 144128 240256 300352 60096 300352 396480 96128 396480 540608 144128 540608 600704 60096 600704 744832 144128 744832 804928 60096 So the begin position of the first record are 0, it must be relative position. At this time, I wasn't able to get "OK" part size, I must find the begin position of first MP3 content manually because it stay right after the "OK" part, begin with FF FB. ![]() Then I do extraction from this position and get all MP3 content voice1.mp3 voice2.mp3 |
Administrator
|
Just want to point to some related posts:
Pointer table: http://old.nabble.com/forum/ViewPost.jtp?post=18769103&framed=y Get the shortest song: http://old.nabble.com/forum/ViewPost.jtp?post=18980234&framed=y Song list: http://old.nabble.com/forum/ViewPost.jtp?post=23020940&framed=y |
Administrator
|
In reply to this post by thanhth
This might be useful:
http://old.nabble.com/forum/ViewPost.jtp?post=18676077&framed=y Somehow I didn't get too far :-(
|
Administrator
|
In reply to this post by thanhth
Some masked samples retrieved by:
http://old.nabble.com/forum/ViewPost.jtp?post=18980234&framed=y masked_samples.zip |
Administrator
|
In reply to this post by thanhth
Without data patching I don't see a chance to progress with the MIDI part.
If someone gets a chance and wants to do some experiments with MIDI data here are some links to old stufff: http://old.nabble.com/forum/ViewPost.jtp?post=11602200&framed=y http://old.nabble.com/forum/ViewPost.jtp?post=11615656&framed=y http://old.nabble.com/forum/ViewPost.jtp?post=11659601&framed=y Hopefully this will give you an idea how to "attack" the problem. Certainly, patching lyrics was easier than MIDI. You have to listen very carefully to the music in order to identify any changes :-( |
In reply to this post by thanhth
if you can extract lyrics, can you figure out voice's gender (i.e. if the song is intended for male/female or duet)?
I am interested in building a list of songs from Arirang disc which has title, lyrics, and voice's gender (male/female/duet). I want to build an search application for iPhone/Android similar to kList for iPhone, but I want to show if the song is intended for what voice's gender (male/female/duet). Also, when you click on a song, you can view the lyrics and more information, etc... |
Administrator
|
Interesting project :-)
This shouldn't be too difficult with the current findings. I guess this information should be at the beginning of each data section, somewhere close to the title information. In order to get this right, we need some specific known information, e.g. 5 songs of each type male, female and duet. Then we can speculate where that information stored. After that we need to patch the disk and put it into the player to confirm the findings.
|
Administrator
|
In reply to this post by qtpie
How's the progress?
Can you build something more generic, e.g. reading from CSV file? My idea is to support any song list and allow controlling software player running on a PC. Say, I import a song list to an iPhone. Then the app provides some management features. I can click on a song and add the song to the player running on a PC. The app basically sends a string via WiFi to PC. Either we need to make the player under stand the message or the PC simply converts the message to keystrokes :-) Phuoc
|
Anyone still working on this?
Anyone has successfully decode the MIDI part? |
This post was updated on .
In reply to this post by thanhth
CONTENTS DELETED
The author has deleted this message.
|
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
I also found that the TOC pointer section of MULTAK.DAT not always equal to the song ID in the song book. For example, the max number of pointers is 39364 but the song ID has number over 40000. The first few dozens matched. That means there must be another mapping table. Maybe in the .IDX file.
|
Most of my findings are mostly concur with what others have already found in this forum/thread, although I worked it out independently.
I am now stuck with the MIDI data part. (and maybe the song's ID to the actual IDs). Other things, like song data block, song title, lyrics, mp3. I think you have already discovered. Unfortunately those are the things that I don't need. I only need the MIDIs |
Oh . A few things I have not figured out yet.
Where is the byte that represents the language? Where is the byte that represents the length of the title and the offset of the start of the lyrics? Where is the byte that represents the offset of the end of the lyrics and the beginning of the MIDI data? I can extract the title/composer/lyrics writer/singer info. still work on the actually lyrics part which quite a mess. I am able to get some but not all. But nowadays you can find lyrics everywhere on internet. So probably not worth the time to work on it. I really want the MIDI part though. If anyone know how to extract it, please let me know. |
Free forum by Nabble | Edit this page |