Understanding the HOTDOG files on DVD of California electronics

classic Classic list List threaded Threaded
135 messages Options
12345 ... 7
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the HOTDOG files on DVD of California electronics

woid
I have made some scripts for analysing the HOTDOG00.DAT file. Here is a summary of my findings:

I made the assumptions that 1) the first part of each song entry contains the lyrics and 2) that the lyrics of english songs is encoded in ASCII.

First I tried filter the beginning of each song entry and remove all bytes that does not represent a letter in ASCII and try to seach for some common english words (both visually and by searching). Unfortunately I was not able to find anything useful using this method.

Then I tryed to take the first couple of words from some songs and see if i could find all the letters in correct order in the beginning of the song entries. This method failed too.

So my conclusion is that one (or both) of my assumptions are wrong. I.e. either:
+ The the first part does not contain the lyrics. (The lyrics may be mixed in the midi stream. But what does the first part contain then? About 2-4000 bytes in the beginning of each song entry...).
+ The lyrics is not encoded in ASCII. They may use a custom font where the english letters have a different encoding than ASCII.

I also noticed that I could only find about 20000 song entries in HOTDOG00.DAT. HOTDOG20.DAT suggested that there would be about 30000 songs so I worried that my scripts was not able to find all song entries in HOTDOG00.DAT and that would be why I could not find any english lyrics. So I made another script to see if there where any gaps where the missing 10000 songs could be located. I found that there where no gaps and in fact there are only 20000 song entries in HOTDOG00.DAT.

The layout of HOTDOG00.DAT look like this:

0-0x200: ???
0x200-400: Table of tables of song entries.
0x400-800: ???
0x800-1000: First song entry table. First 0x400 bytes (0x800-c00) contains addresses to song entries. The second 0x400 bytes (0xc00-1000) contains low numbers (4 byte words). I have not figured out the meaning other than that 0 means that the entry is invalid.
0x1000-5420F0: Song entries (256 song entries packed one after the other)
Then there is padding to arrive to the next even 0x*000 or 0x*800 address where the next table starts, eg:
5420F0-542800: Padding (1808 bytes)
542800-543000: Next table of song enties.
This pattern repeats until the end of the file. No gaps exept the padding before each table.
 
Here is the text-file with the complete layout of my HOTDOG00.DAT file:

content.txt

Notice that some tables contains no song entries (eg table 16-35)...

I am also a bit confused why the number of songs in HOTDOG20.DAT and HOTDOG00.DAT dont match...
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the HOTDOG files on DVD of California electronics

woid
In reply to this post by bigboss97
Here is a c-program that will analyse the HOTDOG00.DAT file and find all the song entries.
It is not pretty but it does the work.

I think the function "findMidi()" would be a good starting point. It is called for each song entry.

I would suggest to try to write a converter that converts the MIDI streams in HOTDOG00.DAT to MF2T/T2MF text format. It seems to be a nice format for debuging. Do you think it is feasible?

h00Analyse9.c
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the HOTDOG files on DVD of California electronics

bigboss97
Administrator
In reply to this post by bigboss97
dvd2midi.zip
Here is my script for generating MIDI file. I know that what you hear is still not a song, but something music alike
The notes seem to be cut off to early.

There's still a bit to go. But I just want to post my results for the meantime.

The program reads the offsets (based on your findings) from a file. I entered them manually (including the 15-16 bytes offset). But you can modify your program to create that file.


Phuoc Can HUA

bigboss97 wrote
I'll take over the MIDI part because I've internsively worked with MIDI before (have to refresh my mind). But it won't happen today.
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the HOTDOG files on DVD of California electronics

bigboss97
Administrator
In reply to this post by woid
That's also what I thought initially. But I found that it's easier to generate the MIDI (binary) directly. My converter does create a log file for the last generated MIDI file.

There are some very useful tools under:
http://www.gnmidi.com/gnfreeen.htm

If you find some invalid MIDI files midi2txt can give you a hint of the corrupt file while other players simply say invalid file format.

Currently, it's actually not the problem in generating the MIDI file, but problem in interpreting the bytes correctly. I took all the information from hdtr6639 but I think I need more than that.

I'll continue with some tuning of the parameters. If you can recognize any generated MIDI song let me know. Btw, the script doesn't fetch the full length.


woid wrote
I would suggest to try to write a converter that converts the MIDI streams in HOTDOG00.DAT to MF2T/T2MF text format. It seems to be a nice format for debuging. Do you think it is feasible?
Reply | Threaded
Open this post in threaded view
|

Lyrics encrypted?

bigboss97
Administrator
In reply to this post by woid
Btw, we have to start updating the subject because the discussion is getting to complicated.

The lyrics might be "encrypted" using a mask. I'm thinking to locate a familiar song (when we can generate the MIDI correctly) then we may get a chance to decode the lyrics.

woid wrote
+ The lyrics is not encoded in ASCII. They may use a custom font where the english letters have a different encoding than ASCII.
Reply | Threaded
Open this post in threaded view
|

DVD2MIDI (script updated)

bigboss97
Administrator
In reply to this post by bigboss97
dvd2midi.zip
It sounds better now, but still far from perfect.

Btw, I simply put a tempo in the MIDI file because couldn't read the information.
Reply | Threaded
Open this post in threaded view
|

Re: DVD2MIDI (script updated)

woid
I have tried to merge your converter with my song entry reader.
I removed the main() functions and put in a separate file. I didn't put much effort into the new main(). It accepts no command line arguments and requires recompile for different output. I tried to keep the function definitions as intact as possible (I changed your ReadMidiDat() a little).

First version converter all songs but I changed it to only convert one song (I ended up with loads of files :-). Change TABLE_NO and SONG_NO to get different songs. TABLE_NO corresponds to the index in the table at 0x200. SONG_NO is the song index within that table received from TABLE_NO.

I discovered that TABLE_NO and SONG_NO (as I suspected earlier) corresponds to 2 unknown bytes in HOTDOG20.DAT.

E.g: "ELEANOR RIGBY"
#define TABLE_NO 0x39
#define SONG_NO 0x2C

Entry in HOTDOG20.DAT:
     0: 11 ; 12 39 2C :  ELEANOR RIGBY

Dvd2Midi.zip
Reply | Threaded
Open this post in threaded view
|

Re: DVD2MIDI (script updated)

bigboss97
Administrator
So far I understood, we already got the TOC structure, i.e. we can get any song according to HD20 and read the data from HD00, right?
I assume that you checked the example "ELEANOR RIGBY" acoustically and were able to identify the melody, right?
If that's the case, can you please also put the facts (regarding table indices) under the summary:
http://www.nabble.com/forum/ViewPost.jtp?post=11381366&framed=y

Just in case a third party is joining our discussion, (s)he doesn't need to read all the conversations.

Since the indices are also the song ID (the numbers we punch in):
http://www.nabble.com/Generate-Song-List-for-California-electronics-tf4012363.html
we are not far from creating a own disk (replacing an existing song by a new hit)  :-)

Outstanding:
1) I'm going to target a familiar song based on above method and use a sequencer to identify the misinterpretation in the MIDI.
2) Lyrics: You might want to target a song which starts with repeating words, e.g. lalala, hey hey etc.
or at least repeating letters, e.g. haPPy birthday. Then we could get a chance to read the text.


woid wrote
I discovered that TABLE_NO and SONG_NO (as I suspected earlier) corresponds to 2 unknown bytes in HOTDOG20.DAT.

E.g: "ELEANOR RIGBY"
#define TABLE_NO 0x39
#define SONG_NO 0x2C

Entry in HOTDOG20.DAT:
     0: 11 ; 12 39 2C :  ELEANOR RIGBY
Reply | Threaded
Open this post in threaded view
|

Re: DVD2MIDI (script updated)

openkaraoke
This is the layout of California Disk

0) 00.DAT : songs file

Organized into 2 levels :
a)Super Blocks (from 0x200)
containing a block of pointers to song blocks
b)Song blocks :
a block of pointers to 256 songs
Vietnamese songs begin from 109th block to 120 block. New songs are written from the 120th block.
The fisrt song is LEDA (super block 109, block 100).

1)02.DAT

SONG LIST
This is the index to the 00.DAT file
The song is indexed as follows:
8xxxxx (6 digits)
remove 8-> xxxxx
then (xxxxx-1)/256 -> the super block
the rest is the index in the corresponding song block.

The song letter is encoded in one byte code as decribed by hdtr6639,i.e., 0-79 ascii , 79-256 : proprietary code. Can be converted to unicode. For vietnamese, it 's  easy to guess. For Chinese, Japanese, Korean, Thai you must know their languages to translate them to unicode.


2) MIDI and LYRICS in the song block

SONG LENGTH
OFFSET TO MIDI
4 bytes: -->
4 bytes:-->
LYRIC PART: seems to be encrypted
3 bytes for MIDI LENGTH ( yy xx xx --->  invert the bytes to find the length)

The midi parts were completely converted to standard midi. The midi part is encoded as decribed by hdtr6639. The difference is now the starting midi event is not only 00ffc but also other different events. But they are the same format (5bytes).

The lyric part is encrypted or zipped as you discovered. But I think that it's encrypted. The reason that I didn't publish my results : I am afraid that the producers will encrypt the midi part as well when they discover that we can crack their midi part. An example is www.sonca.midi where they encrypt both  lyrics and midi!!

I am trying to guess how the lyric part is encrypted. I dscovered that they contain similar block in different songs. For lyrics part, unless they will decrypted, I feed  lyrics found on the vietnamese forum to the midi parts but it's hard. Not only I have to search for the lyrics but also  tosynchronize the songs. But with a small script, it's OK. The most important is getting the right lyrics.

Please visit karaoke.vietbel.org for an example. The java softwares seem to be the unique players to support correctly Unicode (not even VanBasco).

Hope this can help you.
Reply | Threaded
Open this post in threaded view
|

Unicode Player

bigboss97
Administrator
I don't use many (soft) players. But as far as I'm aware of, all the recent players support unicode. How well it's supported is not only rely on the player, but also the songs. If the song is not in unicode then the player is lost.
For instance, most of the Chinese (Taiwan) songs found on net are in BIG5 because that's what their Windows use. So, I have to write a converter for BIG5 to unicode in my player in order to play those songs on English XP.
openkaraoke wrote
Please visit karaoke.vietbel.org for an example. The java softwares seem to be the unique players to support correctly Unicode (not even VanBasco).
Reply | Threaded
Open this post in threaded view
|

Midi decoding

openkaraoke
Here is the code (subroutine) i used to decode Midi blocks and made them available on karaoke.vietbel.org

It's in perl and convert midi block to Midi Track in T2MF format. My solution is a hybrid one, it requires several program including the lyric inserting .So I think that the most important code is how to decode the midi block. You can easily write it again in C directly under Midi standard.


For each midi block

$delay =0;

for ( $j=0; $j < $midi_size -5 ; $j+=5) {
      $midi_block=unpack("H*",substr($midi_part,$j,5));
      $midi_delay=unpack("C",substr($midi_part,$j,1));
      $midi_delay_next=unpack("C",substr($midi_part,$j+5,1));
      $delay+= $midi_delay;
      $midi_delay2=substr($midi_part,$j+1,1);
      $midi_event=substr($midi_part,$j+2,1);
      $midi_channel= ord($midi_event) & 0x0f ;
      #$midi_channel= ord($midi_event);
      $midi_channel+= 1;
      $midi_evnt   = ord($midi_event) & 0xf0;

      $midi_note=ord(substr($midi_part,$j+3,1));
      $midi_velocity=ord(substr($midi_part,$j+4,1));

      # 192 0xC0 176 0xB0

      if (  $midi_evnt == "192" ) { # 0xC0  PrCh
         printf FILEOUT ("%.6d PrCh ch=$midi_channel p=$midi_note\n",$delay);
         }
      elsif (   $midi_evnt == "176" ) { # 0xb0 Par
         printf FILEOUT ("%.6d Par ch=$midi_channel c=$midi_note v=$midi_velocity\n",$delay);
         }
      elsif (   $midi_evnt == "208" ) { # 0xd0 ChPr Channel Pressure
         $bend= ($midi_velocity*128)+ $midi_note;
         printf FILEOUT ("%.6d ChPr ch=$midi_channel v=$bend\n",$delay);
         }
      elsif (   $midi_evnt == "224" ) {  # 0xe0 Pitch Bend
          $bend= ($midi_velocity*128)+ $midi_note;
          printf FILEOUT ("%.6d Pb ch=$midi_channel v=$bend\n",$delay);
         }
      elsif ( $midi_evnt == "160" ) {    # OxA0
         printf FILEOUT ("%.6d PoPr ch=$midi_channel n=$midi_note v=$midi_velocity\n",$delay);
         if ( ord($midi_delay2) > 0x00) {
             #delay1=$delay + ord($midi_delay2);
              $delay1 =$delay + $midi_delay_next + ord($midi_delay2);
             printf FILEOUT ("%.6d PoPr ch=$midi_channel n=$midi_note v=0\n",$delay1);
            }
        }
      elsif (   $midi_evnt == "0" ) {  # 0xe0 Pitch Bend
         printf FILEOUT ("%.6d On ch=$midi_channel n=$midi_note v=$midi_velocity\n",$delay);
         if ( ord($midi_delay2) > 0x00) {
             #delay1=$delay + ord($midi_delay2);
              $delay1 =$delay + $midi_delay_next + ord($midi_delay2);
             printf FILEOUT ("%.6d On ch=$midi_channel n=$midi_note v=0\n",$delay1);
            }
         }
      else {
         printf FILE_EVT ("MIDI EVENT=%s ",$midi_block);
         printf FILE_EVT ("%.6d UNKNOWN $midi_evnt\n",$delay);
         }
                               
      }
   printf FILE_DEBUG ("MIDI EVENT=%s\n",$midi_block);
   close FILEOUT;

BE CAREFUL :

You must sort again the line in order to synchronize On ad Off in the chronological tick!

   # Sort delay

   open(FILEIN,"<$file_name") || die "cannot open file";
   @DATA=<FILEIN>;
   close(FILEIN);


   foreach (@DATA) {
      ($delay,$rest) = split; # get score
      $delay{$_} = $delay; # record it
      }

   @DATA = sort {
      $delay{$a} <=> $delay{$b};
      } @DATA;

   # Wirte the @DATA again and you obtain the Midi Track in T2MF/MF2T format


Hope it will complete your MIDI function.

Some comments :

   #0 PrCh ch=1 p=17   00 ff c0 11 ff  if 3rd byte = c0
   #                   00 2e c0 18 ff  2e :??? (NGAN CACH)
   # Program change:     PrCh[ProgCh] <ch> <prog>
   # TODO why ff 2e ???ff is the identifer of this event!!
   #0 Par ch=1 c=7 v=90  00 01 b0 07 5a  if 3rd byte = b0
   #Delay 144 On ch=2 n=31 v=100  90 08 01 1f 64  --- 08 is another evt when
   # velocity is 0
   # Because the 3rd byte is used  0x0f for channel number
   # To identify the event 3rd byte 0xf0
   # c = PrCh
   # b = Par
   # e = Pitch Bend     xx yy  yy*128+xx
   # 0 else is channel (16 channels ???)
   # 1 ???????? TODO
   # Pitch Bend Pb ch=4
   #    "ChPr": -- 0x0D = ChannelPressure
   #      do(msg[3]) -- chan
   #      do(msg[4]) -- val
   #      return chr(208+ch-1)&chr(v)--0xD0
   #    "PoPr": -- 0x0A = PolyPressure
   #      do(msg[3]) -- chan
   #      do(msg[4]) -- note
   #      do(msg[5]) -- val
   #      return chr(160+ch-1)&chr(n)&chr(v)--0xA0

Reply | Threaded
Open this post in threaded view
|

Re: DVD2MIDI (script updated)

bigboss97
Administrator
In reply to this post by woid
with parameter now.dvd2Midi.zip

woid wrote
I removed the main() functions and put in a separate file. I didn't put much effort into the new main(). It accepts no command line arguments and requires recompile for different output. I tried to keep the function definitions as intact as possible (I changed your ReadMidiDat() a little).
Reply | Threaded
Open this post in threaded view
|

Re: Midi decoding (different tempo)

bigboss97
Administrator
In reply to this post by openkaraoke
I'll have a closer look tomorrow, it's midnight now.

I found that the channels are running with different tempo. Do you have that problem? Is it solved in your script?
I guess there are tempo information in first 15 bytes.
Reply | Threaded
Open this post in threaded view
|

Re: Midi decoding (different tempo)

openkaraoke
I don't understand very well the problem you have with the tempo. Normally the tempo is specified once at the beginning of the Midi track. The basic problem is MIDI players don't use the standard midi and don't put all midi information in the midi block. All other midi data are defined once for these devices. Fortuantely, I got these info from Viet thanh Phuong karaoke software (when I decode these midi files) and discovered that it is valid for CACVS player and California player. Probably it is not ALL needed but sufficient for Soft Karaoke Players.

In summary as explained in the initial post in MIDI buddy forum:

   printf FILEOUT ("MTrk\n");
   printf FILEOUT ("0 TimeSig 4/4 24 8\n");
   printf FILEOUT ("0 Tempo 500000\n");
   printf FILEOUT ("0 Meta 0x21 00\n");
   print FILEOUT @DATA;
   printf FILEOUT ("TrkEnd\n");
   close FILEOUT;

where @DATA is the sorted midi decoded from the MIDI block. Without these above info the midi track is not valid. If you choose an arbitrary Tempo, TimeSig the midi can be playable but not corrcectly interpreted. The above values work any midi that I extracted (from CAVS, California, VietThanhPhuong..).

When you mention the channels are played with different tempo, may be you decode only the Note parts (I have a quick look at your code) and not other events at the beginning of the Midi block. Probably you can check the tick order with the sort function. If the ticks are not sorted correctly, you will have the impression that channels are played with different tempo, because the Off note is written too far for this channel.

By the way, I include here a .kar song and you can hear and compare.DANH_MAT_829656.kar

This song is numbered 829656 in the Califiornia song book.
As you can see, this song is contained in the 115 super block, and is the 215th block in the song block
(29656-1/256)=115 and the rest is 215.
Reply | Threaded
Open this post in threaded view
|

Re: Midi decoding (different tempo)

openkaraoke
Please find the kar file with lyrics.DANH_MAT.kar
The previous one contains only the tick for lyrics, when I get the lyrics and replace them by the words.
Reply | Threaded
Open this post in threaded view
|

Re: Lyrics

bigboss97
Administrator
In reply to this post by openkaraoke
I'm a bit confused. Have you been able to "read" the lyrics?
btw, www.sonca.midi not found.

Honestly, I don't know what the producers think. Certainly, if they keep all the "secrects" of the song disk they may be able make a big business out of that. Just like selling an OS and want to keep the right to provide all the applications. But the fact shows, the more applications available the more people will be using that platform.

openkaraoke wrote
The lyric part is encrypted or zipped as you discovered. But I think that it's encrypted. The reason that I didn't publish my results : I am afraid that the producers will encrypt the midi part as well when they discover that we can crack their midi part. An example is www.sonca.midi where they encrypt both  lyrics and midi!!

I am trying to guess how the lyric part is encrypted. I dscovered that they contain similar block in different songs. For lyrics part, unless they will decrypted, I feed  lyrics found on the vietnamese forum to the midi parts but it's hard. Not only I have to search for the lyrics but also  tosynchronize the songs. But with a small script, it's OK. The most important is getting the right lyrics.
Reply | Threaded
Open this post in threaded view
|

Unicode Karaoke Editor?

bigboss97
Administrator
In reply to this post by openkaraoke
Do you know any free unicode karaoke editor? I use karakan which doesn't support unicode

openkaraoke wrote
I feed  lyrics found on the vietnamese forum to the midi parts but it's hard. Not only I have to search for the lyrics but also  tosynchronize the songs. But with a small script, it's OK. The most important is getting the right lyrics.
Reply | Threaded
Open this post in threaded view
|

Lyrics (A LA LA LA LA LONG)

bigboss97
Administrator
In reply to this post by bigboss97
alala.zipThis is the lyric information I extracted from the song SWEAT (A LA LA LA LA LONG) 0x39 0x1F
It starts at 0xE83A558. I think we can see a very obvious pattern. Those numbers between the repeating pattern must be the time codes. Now we have to covert them to readable letters.




bigboss97 wrote
2) Lyrics: You might want to target a song which starts with repeating words, e.g. lalala, hey hey etc.
or at least repeating letters, e.g. haPPy birthday. Then we could get a chance to read the text.
Reply | Threaded
Open this post in threaded view
|

Re: Lyrics (A LA LA LA LA LONG)

woid
I have also been trying to decode the lyrics. It definetly seems to be encoded in some way. All my attempts to decode the lyrics so far have failed. It doesnt seem to be a simple "add-a-number" encoding.

The only thing I have been able to find out is that the lyrics part starts with the first 2 letters of the title in plain text (unicode encoded, 2 bytes for one letter, 00 xx for english songs, where xx is the same as the ASCII code).

Do you know any encoding algorithm with those characteristics? I.e. the first 2 letters (or first block of 4 bytes) in plain text...
Reply | Threaded
Open this post in threaded view
|

Re: Lyrics

openkaraoke
In reply to this post by bigboss97
This is the website for encrypted midi karaoke:
http://www.soncamedia.com/

In Vietnam, there are several brands for Karaoke Midi (Arirang,TienDat,Caifornia Electronics) but sonca launches their new products with USB support and they encryt midi karaoke song. Thay call them "SUPERMIDI". I dump the file and find that it is more complicated than the classical midi karaoke players. They make the updated midi karaoke songs available online!!!  Obviously it will nice if we can manage to decryp them.

This is why I didn't want to publish my result on the next post last year. I make public only the standard midi karaoke songs that I extracted on my website. The reason is like you, i like very much midi karaoke and adraid that the producers will follow soncamedia and crypt also the midi part. Obviously they have to upgrade the firmware and our old players will be outdated again. (However, there will be an utimate solution, I found that in China they sell the integrated board with MPEG,DIVX and Midi karaoke Codec, these cards are widely used in the karaoke players. Consequently, with some electronics knowledge we can capture the Midi event through the Midi interface on the board!!, But we are not still there).

As far as the Unicode lyric editor concerned, although a lot of players claim that they suport Unicode but most of them don't work. It is the same for Unicode lyric editor. At the beginning, I looked for Unicode lyric editors but I didn't find any. So I developped myself the tool. It's simpler than I can image. In fact, with this script I can insert the lyrics (in Unicode) into 300 midi songs in 1 minutes. In order to synchronize, just add +1or +n before the words. The idea is when you decrypt the midi file, each time you find a Note ON with channel 1 (Melody channel) you create a Mtrk for lyrics (Melody Track) with the corresponding tick (time code), the lyric is still dummy at this period. When you have a text file with lyrics (even in Unicode) the script will replace (and adjust) the words into Melody Track.

The most difficult problem is getting the lyrics. As I say befaore, feeding the lyrics into 300 song takes 1 minute but finding 300 songs takes ...a lot of times!. Fortunately new songs are not unlimited (about 300 new songs in each volume) and most of them can be cut and paste (in Unicode) from different forums. So it's not a bad idea to insert ourself the lyrics while the lyric decryption is still unknown.

As far as the lyrics available on my website, the old one comme CAVS DVD and the new one come from Calfornia DVD. The one come  from CAVS have been exactly extracted  from the DVD. The new one have been fed by the above script (one pass only because I didn't have time to listen and adjust words).
But there is one remark:

In 02.DAT, the Vietnamese song titles are encoded with the same font as CAVS DVD. One day when we arrive to decrypt the lyric parts, probably the same font is used. The difference is CAVS uses bot upper and lower for lyric display while California uses only upper case.

By the way, did you find the same tempo problem with my extrated midi?
12345 ... 7