Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

thanhth
Hi all

I'm trying to rip MIDI from Arirang MIDI Karaoke. Until now, I wasn't able to extract MIDI yet . But I was able to extract such important part as MP3 file.

I'm working with Arirang newest volume 38. You can easily get it by google with keyword "mediafire" . The iso image is about 3.01 GB.

The storage file, as we knew, MULTAK.DAT, and additional storage file MULTAK.DA1.
My analysis for MULTAK.dat

First 336 byte is the header

You can find the number of pointers in TOC at offset 334, two-bytes-integer. In the picture, 22422 pointers, 22422 songs.
You can also find these numbers in header
0 1000 2000 3000 4000 5000 6000 7000
7939 8939 9939 10939 11939
12080 13080 14080 15080
15081 16081 17081 18081 19081
19459 20459
21421 22421
at bold numbers positions in TOC, we'll get FF 00 FF FF, the NULL pointer, I guess.

The TOC begin at offset 3360, As I said before, at position 0, we have NULL pointer FF 00 FF FF, you can also see it at positions 7939, 12080... in TOC.


Except the NULL pointers, I found the way to decode other pointers, each pointer is 4 bytes:

Now I'll call b0 for 1st byte, b1 for 2nd byte and so on... In picture, we have b0 = 0x49, b1 = 0x31, b2 = 0x2b and b3 = 0x09.
b0, b1 and b2 are used to calculate pointer value: N = (b0*60+b1)*75+b2. And b3 is some flags, 4 higher bits will show which storage file(MULTAK.DAT or MULTAK.DA1) the song data located in. It can be 0 or 1. If it's 0, the song data is stored in MULTAK.DAT, at offset N * 2048 + 65536. If it's 1, the song data can be found at MULTAK.DA1, at offset N * 2048.

Examples:
49 31 2b 09
N = (0x49 * 60 + 0x31) * 75 + 0x2b = 332218
4 higher bits of b3 is 0, then the song data is at offset 332218 * 2048 + 65536 = 680448000 of MULTAK.DAT

00 03 17 10
N = (0x00 *60 + 0x03) * 75 + 0x17 = 248
4 higher bits of b3 is 1, then the song data is at offset 248 * 2048 = 507904 of MULTAK.DA1

That's all about the header and the pointers in TOC as far as I know.

The pointers, point to songs data (dĩ nhiên). Let's talk about song data. I found that there are 2 ways a karaoke song stored:

Simple case: simple.raw. The songs is just lyric and MIDI data.


Complex case: complex.raw. Just like simple case but attached with some MP3 content (voices).


It's easy to realize that they use first 2 bytes to classify which is simple songs and which is complex songs.

The first 2 bytes of a simple songs are 00 00, and follow by "OK" . Until now, we are able to extract nothing but the lyric from this part.

The first 2 bytes of a complex songs are FF FF, and follow by a two-byte-integer, the number of MP3 contents (I'll call that number is "M") attached with that song. Next 4 bytes may be some flags. Next, M blocks, each block 12 bytes, specify location of MP3 contents. And the rest just like simple case.

Structure 12 byte block:
2 byte: small integer, unknown, starting of something.
2 byte: small integer, unknown, ending of something, may be related to timing, when will the MP3 content start and stop while the song are playing.
4 byte: integer, begin position.
4 byte: integer, end position.

As you can see in the complex.raw I uploaded, here are the begin and end positions of 8 records, and by subtraction we can get the size also:
0 96128 96128
96128 240256 144128
240256 300352 60096
300352 396480 96128
396480 540608 144128
540608 600704 60096
600704 744832 144128
744832 804928 60096

So the begin position of the first record are 0, it must be relative position. At this time, I wasn't able to get "OK" part size, I must find the begin position of first MP3 content manually because it stay right after the "OK" part, begin with FF FB.


Then I do extraction from this position and get all MP3 content
voice1.mp3
voice2.mp3
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct (from old threads)

bigboss97
Administrator
Reply | Threaded
Open this post in threaded view
|

Re: Lyrics length?

bigboss97
Administrator
In reply to this post by thanhth
This might be useful:
http://old.nabble.com/forum/ViewPost.jtp?post=18676077&framed=y

Somehow I didn't get too far  :-(

thanhth wrote
At this time, I wasn't able to get "OK" part size, I must find the begin position of first MP3 content manually because it stay right after the "OK" part, begin with FF FB.
Reply | Threaded
Open this post in threaded view
|

Masked Samples

bigboss97
Administrator
In reply to this post by thanhth
Reply | Threaded
Open this post in threaded view
|

Re: Some Ideas for data Patching (links)

bigboss97
Administrator
In reply to this post by thanhth
Without data patching I don't see a chance to progress with the MIDI part.

If someone gets a chance and wants to do some experiments with MIDI data here are some links to old stufff:
http://old.nabble.com/forum/ViewPost.jtp?post=11602200&framed=y
http://old.nabble.com/forum/ViewPost.jtp?post=11615656&framed=y
http://old.nabble.com/forum/ViewPost.jtp?post=11659601&framed=y

Hopefully this will give you an idea how to "attack" the problem. Certainly, patching lyrics was easier than MIDI. You have to listen very carefully to the music in order to identify any changes  :-(
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

qtpie
In reply to this post by thanhth
if you can extract lyrics, can you figure out voice's gender (i.e. if the song is intended for male/female or duet)?

I am interested in building a list of songs from Arirang disc which has title, lyrics, and voice's gender (male/female/duet).  I want to build an search application for iPhone/Android similar to kList for iPhone, but I want to show if the song is intended for what voice's gender (male/female/duet).  Also, when you click on a song, you can view the lyrics and more information, etc...
Reply | Threaded
Open this post in threaded view
|

Re: Song Information

bigboss97
Administrator
Interesting project  :-)

This shouldn't be too difficult with the current findings. I guess this information should be at the beginning of each data section, somewhere close to the title information. In order to get this right, we need some specific known information, e.g. 5 songs of each type male, female and duet.
Then we can speculate where that information stored. After that we need to patch the disk and put it into the player to confirm the findings.

qtpie wrote
if you can extract lyrics, can you figure out voice's gender (i.e. if the song is intended for male/female or duet)?

I am interested in building a list of songs from Arirang disc which has title, lyrics, and voice's gender (male/female/duet).  I want to build an search application for iPhone/Android similar to kList for iPhone, but I want to show if the song is intended for what voice's gender (male/female/duet).  Also, when you click on a song, you can view the lyrics and more information, etc...
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD (iPhone App)

bigboss97
Administrator
In reply to this post by qtpie
How's the progress?

Can you build something more generic, e.g. reading from CSV file? My idea is to support any song list and allow controlling software player running on a PC.

Say, I import a song list to an iPhone. Then the app provides some management features. I can click on a song and add the song to the player running on a PC. The app basically sends a string via WiFi to PC. Either we need to make the player under stand the message or the PC simply converts the message to keystrokes  :-)

Phuoc

qtpie wrote
I am interested in building a list of songs from Arirang disc which has title, lyrics, and voice's gender (male/female/duet).  I want to build an search application for iPhone/Android similar to kList for iPhone, but I want to show if the song is intended for what voice's gender (male/female/duet).  Also, when you click on a song, you can view the lyrics and more information, etc...
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD (iPhone App)

Bho1668
Anyone still working on this?
Anyone has successfully decode the MIDI part?
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
This post was updated on .
In reply to this post by thanhth
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
I also found that the TOC pointer section of MULTAK.DAT not always equal to the song ID in the song book. For example, the max number of pointers is 39364 but the song ID has number over 40000. The first few dozens  matched. That means there must be another mapping table. Maybe in the .IDX file.
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
Most of my findings are mostly concur with what others have already found in this forum/thread, although I worked it out independently.

I am now stuck with the MIDI data part. (and maybe the song's ID to the actual IDs).

Other things, like song data block, song title, lyrics, mp3. I think you have already discovered. Unfortunately those are the things that I don't need. I only need the MIDIs

Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
Oh . A few things I have not figured out yet.
Where is the byte that represents the language?
Where is the byte that represents the length of the title and the offset of the start of the lyrics?
Where is the byte that represents the offset of the end of the lyrics and the beginning of the MIDI data?

I can extract the title/composer/lyrics writer/singer info.

still work on the actually lyrics part which quite a mess. I am able to get some but not all. But nowadays you can find lyrics everywhere on internet. So probably not worth the time to work on it.

I really want the MIDI part though. If anyone know how to extract it, please let me know.
Reply | Threaded
Open this post in threaded view
|

Re: Arirang MIDI Karaoke DVD storage file struct and MP3 Extraction

Bho1668
This post was updated on .
CONTENTS DELETED
The author has deleted this message.