Verilog string headaches
I'm fumbling through learning verilog while working on a testbench for a system at work. This system is to verify two devices by talking to them from a phantom microcontroller via an SPI bus. The microcontroller connects directly to device1 via SPI, and can also talk to device2 over this same SPI connection as there is a "bypass" service built into device1. The uC tells device1 via SPI commands to enter bypass mode for the duration of the next SPI communication (ie. while ssn is enabled for the device2 communication and bypass mode ends when ssn is disabled at the end of that device2 communication)
The phantom microcontroller reads commands of what it is to do from a text file, and I'm having trouble with this text file as my Verilog books do not go into great detail of file input and parsing, and I have not found detailed discussion of this sort of thing from google on the net either. I assume my books are of an introductory level for Verilog newbies, and file input may be a more advanced topic. But it's still somewhat frustrating now that I find myself needing to learn about this, my books ignore file input, and google gives web pages that are pretty vague on what they do have to say. I'm finding that the methods I'd use for this in C or Perl languages do not work in Verilog. So I am posting here what I find to work as I go, so that it is out here next time I need it and have forgotten, and perhaps someone else will find it useful as well.
Here's an example of the microcontroller command text file, where NOP is not a defined command, linex beginning with # are comments:
# CMD ADDR DATA
#
#E
#F 1010101
#G 0101010 11001100
#WR 1111111 11111111
NOP 0101010 10101010
#put device1 into bypass mode and talk to device2
WR 0000111 00000001
#run device2 with opcode 00000110
FOP 00000110
#again put device1 into bypass mode and talk to device2
WR 0000111 00000001
# write to device2 at address 0 with page size of 256 using somefile.bin
FPR
00000010 000000000000000000000000
00000110 256
../../../src/somefile.bin
# tell device1 to do read new program from device2, device1 will talk directly to device2 for this
WR 0000111 00000000
Lines starting with # character are comments, all other lines begin with a command code that is up to 3 characters long. I have a while loop running until end of file is reached, and use fscanf to read in each line. Some commands have parameters on the same line, some are "multi-line" commands with parameters on following lines. For example, WR is a register write command with parameters on the same line, while FPR is a multi-line command with 3 lines of parameters.
Here's the definition of the command string register:
`define UCACCESS_FILE_CMD_CHARS 3
reg [8*`UCACCESS_FILE_CMD_CHARS:1] ucfile_command ;
As you can see, ucfile_command can hold a string up to 3 characters long. That's 3 bytes, or 24 bits, in the order of [24:1]
The "tricks" of this is that the fscanf I used for the commands, which was working for defined command lines in the input file, seems to break on comment lines for a couple reasons.
Here's the fscanf line responsible for this blog entry:
status_file = $fscanf(command_file_handle, "%s %b %b\n",
ucfile_command, ucfile_param1, ucfile_param2 ) ;
This was written when only a couple commands were defined and were very simple, the read and write commands each having two binary parameters. Other commands were defined after this fscanf code, which became more complex and began to cause problems. Perhaps some such problems are still hidden from view, as the new multi-line commands, with no parameters on the first line as expected by this fscanf, appear to work OK but perhaps it's causing some invisible problem that isn't obvious but is making my test sim fail.
I intended for this fscanf to be done at the top of each iteration of a while loop. The while loop would exit when end of file was detected. I thought about using a case statement, but ended up going to if/else if/... as I wanted to check if the first character was a #, and if so ignore the rest of the line and do nothing until the next line was read. Checking if the "first character" of ucfile_command turned out to be weird. String registers like how I defined mine here fill on the LSB end, and zero-fill on the MSB end if there is extra space. So I cannot check the first character, because I'm not certain which byte of ucfile_command is the "first" character. Here's a few examples of how things get stored into ucfile_command:
# blah (has a space between # and blah)
ucfile_command holds 24'b000000000000000000100011 with the # character in bits [8:1]
#E (no space between # and E characters)
ufcile_command holds 24'b000000000010001101000101
#WR 1111111 11111111 (no space between # and W characters
ucfile_command holds 24'b001000110101011101010010
As you can see, the # char, represented by 8'b00100011, does not appear in the same position of ucfile_command register in these three examples. So... What to do?
How about I check for the comment char in all three positions, making sure that anything to the left is zero-padded? Kindof a hassle, but seems to have worked for this purpose.
Now, onto trick #2. Even though I'm now detecting comments, something is still goofy. The first line, # CMD ADDR DATA, showed up in 4 successive iterations of fscanf. I effectively got 4 commands out of this, the # comment "command", a CMD "command", a DDR "command" (ADDR without the A due to 3-char limitation of the string register it goes into), and an ATA "command" (DATA without the D again due to the 3-char limitation of the string register). I had thought that the first fscanf would read the entire line, even though it's parameters were not satisfied, and the next iteration would get line 2. This is apparently not the case. As the other "words" of this comment line failed to fit into the fscanf %b parameters, they were apparently buffered for future input attempts, which was not what I intended to do. And because of this situation, if I commented out a proper command line, it might still get run against my will.
The first two web sites I looked at from google search results show fgets might be interesting as it reads until end of line, which might help me clear out the remainder of my comment line so I can move on to the next line of the input file. Unfortunately, they both give a 3-parameter example and my verilog simulator says there should only be two parameters:
r = $fgets(string, n, file); //example from internet
status_file = $fgets(comment_string, `COMMENT_STR_LEN, command_file_handle); //my code
error message from simulator:
ERROR: SYSTF WRNMARG
Wrong number of argument passed to $fgets, should have 2 arguments.
It seems that my simulator does not support the middle parameter in those examples, which is a maximum string length to read in. So I change my code to:
status_file = $fgets(comment_string, command_file_handle); //my newer code
This seems to work better, but still has a problem. Most lines seem to come out OK now, except for these ones which do not seem to exist in the simulation:
#E
#G 0101010 11001100
NOP 0101010 10101010
I believe these input file lines fail to exist in the simulation because there's nothing left on the preceding comment line for fgets, and fgets thus gets the next line instead. Not very desirable for the goal at hand. So... Maybe fscanf with fgets inside a comment completer are not the right way to do this in verilog.
What is the best way to parse input from a text file like this?
Perhaps one could do an fgets at the top of the while loop instead of fscanf, and then do sscanf on the string we got from this new fgets to see what command it is. Check for comment in the sscanf results and if it's a comment then do the next iteration of this new fgets. If it's not a comment, then do to the sscanf results what we now do with the current fscanf results for a defined command.
A second possible new solution is to make all commands multi-line. This way the attempt to read binary parameters for write and read commands do not confuse things, the code to process these commands would have to do its own file input to get the command parameters the same way the newer multi-line commands do. Ultimately this is the solution I went with, which seems to be working pretty well.