Timing and Control Module Specification

© 2007-2009, Kevan Hashemi, Brandeis University.

Contents

Description
Specification
Architecture
TCPIP
SOAR
SIAP
LVDS
Serial Protocol
TCM Registers
Prototype Setup
Harvard Protocol
Conclusion

Description

NOTE: [12-OCT-23] This is a historical document, dating back to 2009, when we were working with Brandeis University on the LSST, now called the Vera Rubin Survey Telescope.

[25-JAN-2009] The Timing and Control Module (TCM) is a component of the proposed Large Synoptic Survey Telescope (LSST). The hardware itself will reside in the camera. The TCM will act as an interface between the Camera Control System (CCS) and the Camera Electronics. Brandeis University's High Energy Physics laboratory has designed and built a prototype TCM along with auxilliary test circuits and diagnostic software.


Figure: The Large Synoptic Survey Telescope.

The camera is the black cylinder just to the left of the gimbal axis in the figure above.


Figure: The LSST Camera.

The Cryostat Outer Cylinder (green) encloses the camera's image sensor and the sensor readout. The readout electronics are divided into several parts. The part that is of concern to the design of the TCM is the Raft Controller Module (RCM). There are twenty-five rafts in the camera. Each raft holds nine image sensors. Each raft has its own controller board, its RCM, and each RCM connects to the TCM with its own cable. The image sensors and the electronics immediately beneath them are cooled by a cryostat. They reside in the cold part of the camera, and operate at around −100°C. The RCMs lie beyond an insulating plate in the warm region, operating at −40°C. The RCMs deliver timing signals to the the image sensors, receive analog pixel voltages from the image sensors, digitize these voltages (see our work on operating ADCs in the cold and warm regions here), and transmit the image data over an optical fiber. The TCM resides in the camera outside the cryostat. It connects to all twenty-five RCMs within the cryostat with wires that penetrate the cryostat wall. The TCM is master of bi-directional communication with each RCM, and performs the functions listed below.

  1. Generate and distribute a synchronized clock signal to the RCMss so that pixel readout from all image sensors can be synchronous.
  2. Read out and set configuration registers on the RCMs.
  3. Provide a TCPIP interface with the Camera Control System (CCS).
  4. Slow read-out of image pixels from the RCMs for laboratory tests.

The TCM is similar to the LWDAQ Driver with Ethernet Interface (A2037E) we developed for our work on the ATLAS End-Cap Muon Spectrometer. Our TCM prototype, the A2101 use newer parts, supports a new TCPIP messaging protocol we call SIAP, and communicates with its multiple slave circuits with a new serial protocol. We enhanced our existing LWDAQ Software to communicate with the TCM using the new messaging protocol.

This document describes the messaging protocol we use to provide communication over TCPIP between a client and the TCM. It describes in detail the serial protocol we use to communicate between the TCM and the RCMs, and the manner in which registers on the TCM must be manipulated to carry out such communication.

Specification

Here is a summary of specifications for the TCM. We intend to modify these numbers in response to comments from collaborators.

  1. Size: 9" x 2" x 6 "
  2. Enclosure: Anodized Aluminum
  3. Heat Dissipation: 10 W
  4. TCPIP Connection: RJ-45, PC104 Linux Module with Digital IO ports
  5. TCPIP Data Rate: >500 kBytes/s
  6. RCM Connection: RJ-45, CAT-5 Unshielded.
  7. RCM Communication: custom bi-directional serial interface.
  8. Pixel Clock: 50 MHz to each RCM, ±5 ns total jitter and offset.
  9. Timestamp: counts cycles of a 32.768 kHz clock.
  10. Impossible to damage or disable RCM hardware through TCM.

The Camera Control System (CCS) will instruct the TCM to perform the following functions.

  1. Synchronize TCM real-time clock with CCS real-time clock.
  2. Write to a single configuration byte on a single RCM.
  3. Write simultaneously to a shared configuration byte on all RCMs.
  4. Read a single status byte on a single RCM.
  5. Program firmware on RCMs.
  6. Upload image sensor control programs to RCMs.
  7. Assemble RCM status reports with timestamps and return to CSS.
  8. Reboot RCMs.

We hope our readers will feel free to extend and correct the above list.

Architecture


Figure: TCM Block Diagram. Marked in red are choices for the prototype, TCM1, for use in laboratories in summer 2008.

TCPIP

One way to control the nodes in the Camera Control System (CCS) is with a TCPIP interface, and one possible implementation of this TCPIP interface is a PC104-format, PC-compatible computer running Linux. The Linux computer would publish its data and functions on the local network and subscribe to the sources of instructions and data that it needs. It could use DDS or some other publish-subscribe protocol.

We bought and tested just such a Linux machine, as we describe in our report Embedded Linux System. Our conclusions were not favorable, as we described at the Camera Control Workshop in Tucson in January, 2008, in our talk Camera Control Nodes.


Figure: A Camera Control Node.

Instead of placing a Linux machine next to the control input-output boards, we propose to put the Linux machine outside the camera, and connect it to its control input-output boards with a long cable. Each control input-output boards communicates serially with its Linux computer. The Linux computers, being together outside the camera, can be combined into one Linux computer. The serial communication can join at a multiplexer and proceed out of the camera on one shared serial communication cable.


Figure: TCPIP Between Linux Machine and IO Boards

Now suppose we use TCPIP to transfer simple messages to and from the control input-output boards. Each board needs a translator that runs a TCPIP stack and passes incoming TCPIP messages to the control input-output board as register reads and writes. We use just such a TCPIP receiver in our LWDAQ Driver (A2037E) and our TCPIP-VME Interface (A2064. The TCPIP translator is the RCM2200 and the RCM3200 respectively, both manufactured by Rabbit Semiconductor, and both costing below $100. Both these translators are eight-bit embedded microprocessor boards with a 10-Base-T Ethernet socket and a bunch of byte-wide IO ports.

The TCM will use an RCM4200 to provide its TCPIP interface on the TCM, and to implement the SIAP messaging protocol, as described in the following section.

The TCM (Timing and Control Module) acts as an intermediary between the Camera Control System (CCS) Master and the camera's twenty-five Raft Controller Modules (RCMs). Communication between the CCS Master and the TCM follows SOAR Protocol developed by German Schumacher for the SOAR Telescope. We implement the Simple Instruction-Answer Protocol (SIAP) on top of the SOAR Protocol. We describe the SOAR Protocol in the next section, and SIAP in the section after that.

SOAR

The SOAR Protocol, also known as the SOAR Communication Library (SCL), is a client-server protocol that provides security based upon the IP address of the client, and parsing of the two-way data flow with four-byte length fields passed at the beginning of any SOAR Protocol message. The SOAR Protocol implemented on the TCM will run on top of TCPIP.

Note: German is planning another version of SOAR Protocol that runs on top of DDS (data distribution service). We have no plans to implement DDS on the TCM, but there is no reason why the TCM cannot implement DDS if there is some advantage in doing so.

Here are the steps required by a SOAR Protocol exchange. Either the client or the server can terminate the communication asynchronously at any time by aborting the socket.

  1. Client opens a TCPIP socket to the server using the server's IP (internet protocol) address and the SOAR server's port number.
  2. Server accepts connection.
  3. Server checks IP address of client against its list of permitted clients. If the client's IP address is on the list, the server sends back a SOAR message containing the ASCII string "DONE". The message contains eight bytes in total: four bytes to give the length of the content and four for the content itself. If the client's IP address is not on the list, the server sends back an error message and aborts the socket.
  4. Client checks that socket opens within a timeout. Otherwise, aborts the socket with an error.
  5. Once socket is open, client tries to read the "DONE" message within a timeout. Otherwise, aborts the socket with an error.
  6. Client sends a SOAR message with an instruction for the server as its content.
  7. Server receives the SOAR Protocol message from the client. Server interprets and executes the instruction. The interpretation and execution of the instruction is nothing to do with the SOAR Protocol.
  8. In this example, the instruction requires an answer. The server sends the client a SOAR message with the answer as its content.
  9. Client receives SOAR message containing the answer from the server.
  10. If the client needs to send more instructions, go back to Step 6. Otherwise, close the socket.
  11. Server sees the client has closed the socket. Server ends the SOAR session.

The length field of a SOAR message is four bytes long. The most significant byte comes first. The length field gives the number of bytes in the SOAR Protocol message that follow the length field. Thus the length field for content "DONE" is 4, because "DONE" has 4 bytes, even though the entire message contains 8 bytes: 4 for the length field and 4 for the content field.

The original SOAR Communication Library, as implemented for the SOAR telescope, assumed that each message would arrive in a single packet. When the first byte of a message was available, the library assumed that all bytes of the message would be available. The protocol did not support long messages broken into multiple packets, or small messages broken into small packets. In order to support the transmission of large blocks of data, as is required by the TCM's diagnostic image readout, we resolved to enhance the original SOAR protocol that it waits for bytes to arrive, irrespective of how many packets they arrive in.

SIAP

The TCM will exist on its local network as a Simple Instruction-Answer Protocol (SIAP) sever. The SIAP protocol is a TCPIP messaging protocol that runs on top of the SOAR messaging protocol we describe above. Each SIAP message begins with the SCL length field of four bytes, and is itself an SCL message. Following the length field is a four-byte message identifier field. The message identifier tells us the function of the message and the format of its contents. All SIAP fields are big-endian, as is the length field in a SOAR message. The most significant byte comes first.

Note: We base the SIAP message identifiers upon the ones we use in our LWDAQ messaging protocol. This set of identifiers has grown over the years in response to our needs. We describe both the LWDAQ and SIAP protocols in our LWDAQ Specification. Starting with version 7.1, the LWDAQ Software supports both LWDAQ and SIAP connections for all data acquisition. By default, the software uses SIAP for server port numbers 30000 to 40000, and LWDAQ otherwise.

The TCM, and any other SIAP server, presents the client with a 32-bit address space called the server address space. No memory locations are reserved by the SIAP protocol. The client can read from and write to this address space using SIAP messages. Reading and writing is byte-by-byte to avoid any confusion resulting from byte ordering, and to give the greatest flexibility and efficiency when dealing with a variety of server platforms. The layout of this address space is a function of the individual server.

In addition to the server address space, the server may also provide one or more memory blocks. These memory blocks might be RAM, hard disk, or flash drive. The TCM provides on a RAM memory block. On the A2100, this memory block is 4 MBytes of static ram. The server memory blocks may be larger than 4 GBytes, as would be the case with a hard drive. For this reason, they cannot appear in the server address space, which is limited to 4 GBytes. Access to the memory blocks is through memory portals with the help of memory address. The memory address is a register in server address space that points to a byte in the memory block. We read this byte by reading from the memory portal, which is another address in server address space. Each time we read from the memory portal, the memory address increments by one. We read sequential bytes form a memory block with consecutive reads from the memory portal. The SIAP protocol supports repeated reads from the same address with the stream_read message. Writing to a memory block follows the same principles: we write repeatedly to the memory portal. The stream_write message supports repeated writes to the same address.

Message
Name (Value)
Function Fields
Name (Size)
version_read (0) read server software version number L (4) ID (4)
byte_write (1) write to a byte location L (4) ID (4) Address (4) Value (1)
byte_read (2) read from byte location L (4) ID (4) Address (4)
stream_read (3) read repeatedly from a byte location
and return successive bytes in a block
L (4) ID (4) Address (4) N (4)
data_return (4) message contains requested data L (4) ID (4) Data (L−8)
byte_poll (5) wait until byte location has particular value L (4) ID (4) Address (4) Value (1)
login (6) log into server to obtain supervisory access L (4) ID (4) Password (L−8)
config_read (7) read server configuration file L (4) ID (4)
config_write (8) re-write server configuration file L (4) ID (4) Configuration (L−8)
mac_read (9) read server MAC address L (4) ID (4)
stream_delete (10) write value repeatedly to byte location L (4) ID (4) Address (4) N (4) Value (1)
echo (11) re-transmit contents as data L (4) ID (4) String (L−8)
stream_write (12) write repeatedly to same byte location
with successive bytes from a block
L (4) ID (4) Address (4) Block (L−12)
reboot (13) re-start server and re-load configuration L (4) ID (4)
Table: SIAP Message Identifiers. We give field sizes in bytes. The L field is the SCL length. The ID field is the message identifier number. The N field, when present, gives the number of bytes to be read or written. When the final field has a variable size, we deduce the size from L.

The SIAP messages provide two ways to read bytes from the server address space. The byte_read reads a single byte at address A. The server returns this byte in a data_return message. The stream_read asks the server to read repeatedly from the same byte address and send back all the bytes it read as a single block. The server reads address A a total of N times and returns the N bytes in a data_return message. When used with a memory portal, the stream_read retrieves a block of data from one of the server's memory blocks.

There are two ways to write bytes to the server address space. The byte_write writes a single byte at address A. The stream_write writes repeatedly to location A. Each write stores the next byte in the block of bytes supplied with the stream_write message. When used with a memory portal and memory address register, the stream_write transfers a block of data to one of the server's memory blocks.

Because the SIAP protocol reserves no locations in its 32-bit address space, there are SIAP messages that permit any SIAP client to interact with any SIAP server and query its status and configuration. The version_read message requests the server software version. The server should implement the version_read processing as efficiently as possible, so that SIAP clients can use the version_read as a way of punctuating message transmission. If the client sends too many messages to the server, none of which require a response from the server, the server message buffer will over-flow. The client uses version_read to introduce a pause in such sequences. When the client receives a data_return message with the server version number, it proceeds.

The byte_poll instruction puts the SIAP server into a loop, waiting for a particular byte to take on a particular value.

Example: The client wants to direct the TCM to accumulate in its address space twenty-five 1-KByte blocks of ADC samples from the RCM temperature sensors. To improve the efficiency of this exchange, the client sends the TCM a long sequence of commands with byte_write message punctuated by byte_poll messages. These commands cause the samples to be transferred into the TCM memory and assembled into a 25-KByte block. During the sequence of commands, the client uses version_read to avoid over-flowing the TCM message buffer. When the data block is ready, the client reads the block out of TCM memory with a single stream_read.

The TCM must abort whatever it is doing whenever its client closes its TCPIP socket. The TCM must check the status of the socket frequently. The client must be able to abort this loop at any time by closing the socket. No harm must come to the TCM, nor to any RCMs, as a result of sudden abandonment of any data acquisition activity when the client closes the socket.

Example: If, during the readout of an image from a RCM, we want to abort the data transfer, we abort the TCPIP socket to the TCM. Closure of the socket causes an immediate halt to the TCM's participation in the data transfer from the RCM. We open a new socket and send a serial protocol reset instruction to the RCM. This instruction will stop the RCM transmitting any further data. The RCM returns to its rest state after the reset.

No command register on the TCM or on the RCMs will contain more than one parameter. We have plenty of address space, so each independent parameter will have its own address. Thus we disallow the use of separate bits in the same register byte for independent parameter. This restriction frees us from the need to read a register before we write to it. When two independent parameters share a byte, we cannot set one of them without re-writing the other. By assumption, the other parameter is independent, so we have to obtain its current value from somewhere before we set it again. Either we read the register itself, in which case the register must support reading, or we keep a copy of the register in memory (the shadow byte).

All SIAP servers have a configuration file that lists the permitted client IP addresses, specifies its own IP address, its timeout behavior, security level, its name, its serial number, and various other such parameters, as well as any server-specific configuration information. This configuration will reside in EEPROM so that it is available upon start-up. We can re-write the configuration file dynamically over TCPIP with a configuration_write message. We apply the new configuration either by pressing the reset button on the server, or by turning the power off and on again, or by sending the server a reboot message. The reboot tells the server to re-boot its operating system and re-load its configuration file. Because the new configuration file can specify a new IP address for the server, it is possible to disable a SIAP server by giving it an IP address that makes any further communication with the server impossible.

Example: The TCM IP address is 120.1.1.2 in a local area network in the camera utility crate. The utility crate has its own local area network, and communicates with the outside world through a router. The router accepts TCPIP communication with any IP address in the range 120.1.0.0 to 120.1.255.255. We re-write the TCM configuration file and give it IP address 10.0.0.2. We send a reboot message. The TCM re-boots with this new IP address. Now the router makes it impossible for us to contact with the TCM from outside the camera. We have to unplug the TCM and re-configure it with a separate cable.

A SIAP server can protect itself against such a disaster by requiring that the client log in with a password for supervisory access. Such a requirement makes it very unlikely that normal SIAP client procedures will re-write the configuration file by accident.

A SIAP server need not implement all SIAP messages, but it must at the very least implement the version_read so that the client can determine which messages the server implements. The echo message instructs a SIAP server to return the its string field in a data_return message. When the server wants to indicate an error condition, it does so by closing the socket. The client can open another socket and read registers in server address space to determine the nature of the error.

LVDS

The TCM (Timing and Control Module) provides a separate hardware socket for each RCM (Raft Controller Modules). The current camera design includes 25 RCMs. The TCM will provide 32 sockets to allow for additional RCM-like circuits around the periphery of the camera.

Each RCM socket on the TCM provides eight electrical connections. These we arrange as four low-voltage differential signals pairs. The shield of each socket is connected directly to the zero-volt (0-V) potential on the TCM circuit. The cable shield connects the 0-V potentials of the TCM and RCMs.

NOTE: Be sure to use a shielded ethernet cable to connect the A2101A to an RCM or A2101X. Alternatively, connect the 0-V potential of the two boards with a separate conductor. Without this 0-V connection, serial communication between the two circuits becomes erratic. Without the ground connection, we found the communication would be reliable one day and then fail intermittently the next.

The Timing and Control Module (A2101) is for use in laboratories during development and testing of the camera. The A2101 provides eight shielded RJ-45 sockets. Sockets 1 through 7 are sockets for use with the TCM Prototype (A2101A) and Socket 8 is for use on the RCM Emmulator (A2101X). Sockets 1 to 7 are master sockets and socket 8 is a slave socket.

Three of the four signals on each master socket are outputs. The fourth is an input. The pin assignments on the cable are as follows. In the slave sockets, outputs become inputs and inputs become outputs. We simply replace the chips with their inverse partners. (In doing so, with the particular chips we use on the A2101, the signals in the slave socket end up being inverted, but we remove the inversion in our progammable logic.)

PinNameFunction
1SCK+Serial Clock Out, Positive
2SCK−Serial Clock Out, Negative
3SDO+Serial Data Out, Positive
6SDO−Serial Data Out, Negative
4SAO+Serial Program Out, Positive
5SAO−Serial Program Out, Negative
7SDI+Serial Data In, Positive
8SDI−Serial Data In, Negative
SSHIELDTCM 0-V Potential
Table: RCM Socket Pinout. WE use the Ethernet wiring convention, with pins 3 and 6 paired together.

The SCK signal carries a 50-MHz clock to all RCMs in the camera, and so guarantees synchronous pixel clocking across the image plane. The distribution of this clock is the primary purpose of the TCM. The SDO signal carries serial data to the main logic chip on the RCM. The SAO signal carries serial data to the auxilliary logic chip on the RCM. The auxilliary logic chip is responsible for re-programming the main logic chip. The SDI signal carries serial data back from the RCMs. In the following section, we define the serial protocol proposed by Brandeis University for communication between the TCM and RCMs. In a later section, we discuss the protocol proposed by Harvard University.

Serial Protocol

Here we describte the protocol for communication between the TCM and RCM as designed by Brandeis University. We discuss the protocol designed by Harvard University below.

The serial protocol determines the way the TCM and RCM will use the four low-voltage differential logic signals that run between them. Three of these signals pass from the TCM to the RCM, and the fourth passes back from the RCM to the TCM. This protocol must provide a sustained, low-noise clock signal of 50 MHz. We reserve one of the outgoing signals for a continuous 50-MHz clock, that's the SCK signal. At least one incoming signal must be used to carry information back from the TCM, that's the SDI signal. Two outgoing signals are left to control the RCM. There are two components of the RCM that must be controlled. One is the main functional logic chip on the circuit, which performs the image readout and all other operations. The other is an auxilliary programming logic chip, which re-programs the main logic chip. This auxilliary logic chip cannot itself be re-programmed from the TCM, but it does run on firmware downloaded through a connector on the RCM board.

The TCPIP and LWDAQ protocols use only one outgoing and one incoming signal for bi-directional communication. They do not have even the benefit of a separate clock signal. The TCM-RCM interface does have a separate clock signal, so serial control of the RCM's main logic chip though SDO is straightforward. It may be possible to communicate simultaneously with the main and auxilliary logic chips on the RCM through the SDO signal alone, but there are many potential pit-falls in such an arrangement. One of the TCM specifications is that it cannot damage or disable the RCM. Any combination of main and auxilliary communication must make it impossible for the auxilliary chip to disable communication between itself and the TCM. If the auxilliary chip can re-program the main chip so that the main chip drives the SDO lines, further communication with the auxilliary chip may be impossible. When communication with the auxilliary chip breaks down, we cannot re-program the main chip on the RCM, and the RCM is disabled. We will dedicate one outgoing signal, SAO, to communication with the auxilliary chip. The security of the RCM is now assured.

Every serial transmission to and from the RCM will begin with a start bit and end with a stop bit. Because most LVDS transceivers default to a HI state when they are open circuit, the default state of the signals will be HI (1). The start bit is a LO (0) that lasts for 20 ns (one 50-MHz clock period). The stop bit will be a HI (1) for continuity with the default or high-impedance state of the lines.

Because data transfer from the TCM to the CCS (Camera Control System) is byte-wise over SIAP, we make the data transfer between the TCM and the RCMs byte-wise as well. A smaller or larger serial word size would require combining or splitting of serial words to support generic data access over SIAP, which is byte-wise. Each serial word passing between the TCM and RCM will contain eight data bits, so we refer to these exchanges as serial bytes.

In order to force an interruption of data transfer from the TCM to the RCM, we must have a way of transmitting an exceptional serial word that the RCM can identify as a non-data word, even though it is in the act of receiving data words. The absolute minimum over-head we can dedicate to a distinction between data and instruction words is one bit. So that's what we'll do: the first bit after the start bit is the TYPE bit. If it is 1, the serial word contains data. It it is zero, the word contains an instruction.

Each serial word now consists of 11 bits: start, type, eight content bits, and a stop bit. We will use the same word structure for communication to and from the TCM and the auxilliary and main logic chips on the RCM. At 20 ns per bit, the maximum data rate between the RCM and TCM is 4.5 MBytes/s. This 4.5 MByte/s is far higher than the 600 kByte/s limit imposed by the TCPIP Interface.

The RCM receives a 50-MHz clock, SCK, from the TCM. The RCM can use this clock to synchronize data from the TCM. To make use of SCK, we must specify the phase of SDO and SAO with respect to the SCK edges. Changes in SDO or SAO will take place on or close to the falling edges of SCK, as seen by the RCM. The RCM can feel free to clock SDO and SAO on the rising edge of SCK. The set-up time for SDO and SAO before the rising edge of SCK must be at least 5 ns at the RCM.

The TCM receives no clock from the RCM with which it can synchronize the data it receives. The round-trip propagation delay of the cable between the TCM and the RCM may be as great as 50 ns for a five-meter cable, plus an undefined delay within the RCM itself. The TCM is equipped with a 200-MHz clock generated by quadrupling its 50-MHz clock. With the help of the 200-MHz clock, the TCM detects a start bit from the RCM and synchronizes the remaining serial bits using a 50-MHz clock of the correct phase.

For the sake of generality, we say that the TCM is the master or serial communication and the RCMs are slaves. We note that the RCM does not have to use SCK to receive serial data from the TCM. The RCM could use a 200-MHz clock in the same way as the TCM. We can imagine a serial communication between master and slave that uses only SDO and SDI. We could omit SAO if we did not require the slave to be reprogrammable. We could omit SCK if we did not require all slaves to share the same clock. The following table presents several possible variations of our Serial Protocol.

Protocol VariantDescription
Synchronous with AuxilliaryMaster transmits SCK and SAO
Asynchronous with AuxilliaryMaster transmits SAO but not SCK
Synchronous without AuxilliaryMaster transmits SCK but not SAO
Asynchronous without AuxilliaryMaster transmits only SDO and SDI
Table: Variants of the Serial Protocol.

The TCM-RCM communication is "Synchronous with Auxilliary". The TCM is the master and the RCMs are slaves. From here on we will talk in the generic terms master and slave, but you can assume that the master is the TCM and the slave is an RCM.

We combine instruction and data bytes to control communication between master and slave. Each message in either direction, between the master and the auxilliary or main logic chips on the slave, begins with an instruction byte.

Example: When the master wants to reset the slave's main logic chip, it sends a reset instruction on SDO. No data words follow the reset. The slave logic should reset itselve entirely after receipt of the reset instruction, and be ready to receive further instructions as soon as it recovers from the reset.

Some instructions are be followed by data bytes that qualify the instruction. Multiple-byte parameters are transmitted most-significant byte first (big-endian). The following table lists the available instructions, their parameters, and which party to the communication can transmit them.

Instruction
Name (Value)
Function Parameters
Name (Size)
Source
error (0) notify master of slave error none Slave Only
write (1) write block to slave Address (4) Length (4) Block (Length) Master Only
read (2) read block from slave Address (4) Length (4) Master Only
abort (3) abort current transfer none Master Only
reset (4) reset slave logic none Master Only
execute (5) start slave task none Master Only
data (6) return block of data to master Data (variable) Slave Only
null (255) synchronise serial receivers none Master or Slave
Table: Serial Protocol Instructions.

Each slave appears to its master as two address spaces: one for the main logic chip and one for the auxilliary logic chip. The main logic chip provides the main space and the auxilliary logic chip provides the auxilliary space.

To write a block of bytes to main space, the master transmits a write on SDO. The slave expects four data bytes that specify an address in main space and four bytes that specify the number of data bytes to be written. After that, the slave waits for the specified number of bytes. It stores each byte in a consecutive location in its main space, starting at the specified address. If any of the word received by the slave is an instruction, the slave must abort the write and begin execution of the new instruction. In particular, the master can abort with an abort instruction, after which the slave enters its rest state, waiting for further instructions from the master. Conversely, the slave can request that the master to stop the write by sending an error instruction. The master is under no obligation to honor this request.

Example: The TCM does not abort serial communication when it receives and error from a slave. The TCM sets its Received Instruction Register to the erro code (all zeros), but otherwise continues execution of its current serial communication task.

To read a block of bytes from main space, the master sends a read on SDO, followed by four data words giving an address and four data words giving the number of bytes to be read. The slave responds with a data instruction and a block of bytes on SDI. The slave does not send any bytes giving the length of the block it is transmitting. This length is already known to the master, who issued the read instruction. If the slave encounters an error, it can transmit an error instruction on SDI. If the master wants to abort the read, it can transmit abort on SDO.

To write a single byte, we use a write of length one. To read a single byte, we use a read of length one.

Example: An RCM is transmitting a large block of data to the TCM. The TCM is storing the data for subsequent readout by a SIAP stream_read. The slave receives an urgent error from its monitoring circuits. It transmits an error on SDI. The TCM detects the error and aborts its write operation. The SIAP client checks the TCM's Received Instruction Register and sees an error instruction. The SIAP client sees the error instruction begins to interrogate the RCM to determine the source of the error.

The null instruction must always be ignored. Either the master or the slave can insert a null into any data stream without any effect upon the communication other than to delay the transfer of the next serial byte. The null instruction can be used to correct timing errors that might arise between the serial transmitter on the master and serial receiver on the slave. In an asynchronous system, null instructions can be used to transmit a clock signal to the slave, or from the slave to the master.

Example: An asynchronous slave receives only SDO from its master. The slave is so rudimentary and low-power that it does not have a clock oscillator. It receives serial words with the help of a ring oscillator that fires up whenever a stop bit arrives. For one function, the slave is supposed to provide a square clock signal of exactly 1 MHz to a sensor. The master provides this clock by sending a sequence of null instructions, each with 14 extra stop bits on the end. The signal on SDO is a 40-ns LO pulse every 500 ns, which gives the slave a 2-MHz clock. The slave divides the clock by two to get a square 1-MHz.

The error instruction is designed to notify the master that one of its slaves is having trouble. The error instruction is a single 200-ns LO pulse on SDI. The master can detect an error on any one of its slaves simply by looking for a 200-ns LO pulse on SDI returning from each slave, or by combining all its slave SDI logic levels together with an AND instruction, and looking for a 200-ns LO pulse on the combined signal. Any one of the slaves can, in either circumstance, convey its error instruction to the master, even if the master is currently receiving bytes from another slave.

Example: The TCM is receiving multiple 1-MByte blocks of data from RCM Number 4. During this extended transfer, RCM Number 2 detects an urgent over-temperature problem on one of its image sensors. It drives its SDI line LO for 200 ns in the error instruction. The AND combination of all slave SDI lines goes LO for at least this 200 ns. When the AND combination of SDI rises again, which it will do eventually, the TCM sees an error instruction. The TCM does not abort its read, but it does set its Received Instruction Register to the error instruction code (all zeros). By checking the Received Instruction Register, the SIAP client can detect the error, stop the transfer of data from RCM Number 4 and interrogates the RCMs to find out where the error came from. Within a fraction of a second, the source and nature of the error will be known to the Camera Control System, even though the SIAP client was mid-way through a thirty-second data transfer.

The state machines that receive serial words in both masters and slaves must use the start bit (LO) to begin reception of a serial word. They may not relay upon the particular behavior of a particular master to simply assume that the start bit will arrive at a particular time. Furthermore, the correct operation of the error instruction requires that all serial receivers wait for the stop bit (HI) before being available for the next serial word. A continuous LO value must not cause repeated error instructions in any serial receiver.

TCM Registers

The TCM presents a thirty-two bit address space through its SIAP interface, but uses only the first sixty-four locations in that space.

Address
(Hex)
Address
(Decimal)
ContentsRead-Write
000hardware identifierR
022received instruction registerR
033serial job registerRW
044transmit data registerW
1218hardware versionR
1319firmware versionR
18..1B24..27data address (bytes 3..0)W
2840configuration switchR
2941software resetW
2A..2D42..45transmit select mask (bits 31..0)W
30..3348..51receive select mask (bits 31..0)W
3F63ram portalRW
Table: TCM Address Map. In the last column, an R means the location may be read out, and a W means the location may be written to. All four-byte registers are big-endian.

The TCM provides one memory block, a 4-MByte static RAM. This block is accessible through the RAM Portal. A read or write form the RAM Portal will read or write a byte from the memory block. The byte read or written is the one pointed to by the Data Address. Each use of the RAM Portal increments the data address by one. The Data Address itself can be set to an initial value by writing to the Data Address locations. The Ram Portal is for use with the SIAP stream_read and stream_write messages.

Note: The TCM address map is compatible with our existing LWDAQ Driver (A2037) address map. We can use our existing LWDAQ Software to configure and communicate with the TCM. The TCM RAM portal and data address are at the same locations, so we can use existing routines that write to and read from the RAM. You will find such scripts in our library of test scripts, A2101.tcl. There's one that creates a 4-MPixel gray-scale image in the TCM memory, reads it back, and displays it on the screen. Another writes a Rasnik image to the TCM memory and reads it back out again.

The Transmit Select Mask selects which RCMs will receive transmissions via SDO and SAO. After a hardware or software reset, the Transmit Select Mask is all zeros. Communication with all RCMs is disabled. The least significant bit in the thirty-two bit register enables writing to RCM number 1. The most significant bit enables writing to RCM number 32. Any combination of bits may be set to enable simultaneous writing to any combination of RCMs on either the SDO or SAO signals.

The Receive Select Mask selects which SDI signals received from RCMs will be combined together to obtain the SDI signal used by the TCM's Serial Communication Controller. Provided that only one RCM drives its SDI lines at a time, the TCM will receive the messages correctly.

The Serial Job Register tells the TCM to perform a serial communication job. The SIAP client writes a serial job code to the Serial Job Register and the TCM begins executing the job. We list the serial job codes below.

Name Code Action Client Procedure
idle 0 serial interface idle None.
write 1 transmit write via SDO Write data to RAM, set DA, write code to SJR,
write address and length to TDR, poll SJR, check RIR.
read 2 transmit read via SDO, receive data via SDI Set DA, write code to SJR, write address and length
to TDR, poll SJR, check RIR, read data from RAM.
abort 3 transmit abort via SDO Write code to SJR (Note 1)
reset 4 transmit abort via SDO Write code to SJR (Note 1)
execute 5 transmit abort via SDO Write code to SJR (Note 1)
null 6 transmit null via SDO Write code to SJR (Note 1)
aux_write 9 transmit write via SDA Write data to RAM, set DA, write code to SJR,
write address and length to TDR, poll SJR, check RIR.
aux_read 10 transmit read via SDA, receive data via SDI Set DA, write code to SJR, write address and length
to TDR, poll SJR, check RIR, read data from RAM.
aux_abort 11 transmit abort via SDA Write code to SJR (Note 1)
aux_reset 12 transmit abort via SDA Write code to SJR (Note 1)
aux_execute 13 transmit abort via SDA Write code to SJR (Note 1)
aux_null 14 transmit null via SDA Write code to SJR (Note 1)
Table: Serial Job Codes. We use SJR for Serial Job Register, TDR for Transmit Data Register, RIR for Received Instruction Register, and DA for Data Addresss. Note 1: This job completes in 220 ns, which is fast enough that the SIAP client can assume the job completes immediately.

The TCM begins executing a job a soon as the SIAP client writes to the Serial Job Register. Some jobs complete without any further action on the part of the SIAP client. Other jobs require the SIAP client to write bytes to the Transmit Data Register before they proceed. Some jobs require that data be placed in the TCM's RAM, with the Data Address pointing to the first data byte, before the SIAP client writes to the Serial Job Register.

When the serial communication requires address and length bytes, the SIAP client specifies these by writing them to the Transmit Data Register in the order they will be transmitted to the RCMs. The TCM knows which bytes to use for its internal data counters.

Any write to the Serial Job Register stops the current serial communication job. The TCM begins executing the new job code immediately. The jobs with the aux_ prefix use SAO instead of SDO, and so communicate with the slave auxilliary logic chips instead of their main logic chips. Note that the TCM, as master of the serial communication, has no error or data jobs. Only slaves can send error or data messages.

Any extended communication between a master and slave of our Serial Protocol may be interrupted by the master with an abort instruction. A slave can request a stop with an error instruction. The TCM allows a SIAP client to abort communication with the abort and aux_abort jobs. The TCM responds quickly to error instructions it receives from slaves by setting its Received Instruction Register to the error instruction code, which is zero (0). After any extended communication job, we recommend that the SIAP client check the Received Instruction Register. The TCM does not abort communication with a slave when it receives an error.

The Received Instruction Register resets to the null instruction code (all ones) whenever the SIAP client writes to the Serial Job Register. Each time the TCM receives an instruction from any one of the RCMs selected by the Transmit Select Mask, it saves the instruction code in the Received Instruction Register. When the SIAP client reads the register, it reads the most recent instruction received after the most recent write to the Serial Job Register. After a successful write, the Received Instruction Register should be null, because the slave has no need to transmit an instruction to the master during a write. After a successful read, the register should contain the data code. If, in either case, the register contains the error code, the SIAP client knows that the TCM received an error.

Example: The SIAP client wants to send a 4-MB block to the auxilliary device on all its RCMs, but it does not want to ignore errors during the transmission, which will take roughly one second. It sets the Transmit and Receive Select Masks to all ones. It sets the TCM's Data Address to zero. It writes the 1-MB to the TCM's RAM with a stream_write to the RAM Portal. It sets the Data Address to zero so that the address points to the first byte of the 1-MB block. It writes aux_write to the Serial Job Register. The TCM sets the Received Instruction Register to null (all ones). It transmits a write on SAO. No it waits for the SIAP client to write eight bytes to the Transmit Data Register. The SIAP client writes these eight bytes with a single stream_write, which is faster than eight consecutive byte_write operations. The first four bytes give an address in slave auxilliary space. The final four bytes give the length of the block (1,048,576). The TCM begins the block transfer. It reads the 1-MByte block out of RAM one byte at a time and transmits the bytes on SDO to all its RCMs. If, at any time during the transmission, the TCM sees an instruction on SDI, it stores the instruction code in the Received Instruction Register. Meanwhile, the SIAP client checks the Serial Job Register with a byte_read looking for the register to change from aux_write to idle (zero). It checks the Received Instruction Register for an error instruction. If it sees an error, it knows the one of the RCMs has encountered a problem. It writes abort to the Serial Job Register. The TCM aborts its write operation and sends an abort instruction to all its RCMs. Now the SIAP client starts to interrogate the RCMs one at a time, checking error flags.

In order to remain attentive to errors issued by the RCMs, we see that the SIAP client must check the Received Instruction Register and the Serial Job Register using byte_write messages instead of byte_poll. The byte_poll is executed by the TCM itself: the TCM waits until a byte attains a particular value. The TCM cannot monitor two registers at once with the byte_poll. When the serial communications are short-lived, there is no problem waiting until the end of the communication before checking the Received Instruction Register. In such cases, the byte_poll is more efficient, because it does not require back-and-forth communication over TCPIP to detect the end of a short-lived task.

Prototype Setup

We describe how to set up and use our TCM Prototype and our RCM Emmulator in our A2101 Manual.

Harvard Protocol

The Havard University group, which is designing the RCM, proposes the serial protocol defined by Document 4512 of the LSST Archive. You need a password to get into the archive, so we provide a PDF copy of the document here. Here are the top ten problems we foresee in an effort to implement and test this protocol.

False Pulses on Enable: The default value of the logic signals is zero, so that plugging in and unplugging a cable will cause random pulses that may be mis-interpreted as instructions.

Data and Instruction Words Indistinguishable: There is no way defined for a serial receiver to distinguish between an instruction word and a pair of data words. The protocol mandates the transmission of serial words of different lengths. Instruction words are 34 bits long: 1 start bit, 4 type bits, 16 address bits, 12 length bits, 1 stop bit. Data words are 18 bits long: 1 start bit, 16 data bits, 1 stop bit. A receiver of serial words must be in a state to receive an instruction or a data word. There is no way to restore the slave to either the instruction-ready or data-ready state. If the slave is data-ready, no instruction can persuade it to become instruction-ready. If it is instruction-ready, data bytes will be interpreted as instructions, resulting in undefined activity.

Uninterruptable Block Moves: Once the TCM starts to write a block of data to the RCM, there is no way defined in the protocol for the TCM to abort the block transfer and return the RCM to a known state. Conversely, once the TCM starts to read a block of data from the RCM, there is no way defined in the protocol for the TCM to abort the block transfer and return the RCM to a known state.

No Error Messages: The protocol does not allow the RCM to send an error message to the TCM with information describing the error.

Undefined Image Access: The protocol states that image data will be available "as 16k pages of 1k words per page", where one word is 16 bits. But the only read operation defined in the protocol is a block read of length 0 to 4095 words. The manner in which the block read length relates to the number of pages read is undefined. It is possible that we set the length to 1 to get 1,024 words of image data (1 Kword). Or perhaps we get only 1,000 words of image data (1kword). Or perhaps we must specify a block length that is a multiple of 1,024 or 1,000. If so, does the TCM generate an error when it receives a non-conforming read to such an address?.

Address-Dependent Behavior of Instructions: The protocol forbids word-wise access in the range of addresses assigned to image data. At other addresses, word-wise access is permitted. The behavior of the instructions is therefore address-dependent. Error-handling must be address-dependent. The RCM address map must be duplicated in the TCM code so that the TCM knows how to handle access to various addresses.

No Byte-Wise Access: The protocol forbids byte-wise access to registers and data. But data transfer through the TCM's TCPIP interface is always byte-wise. The TCPIP protocol is byte-wise. The SIAP protocol is byte-wise. If the TCM receives a SIAP read instruction for three bytes starting at an odd byte address, what does it do? Does it report an error? Does it read two sixteen-bit words and extract the three bytes it needs? If the TCM receives a nine-byte write instruction, what does it do?

No Endianess Defined: The protocol does not define the byte-ordering of multi-byte registers. Is the ordering big-endian or little-endian?

Redundant Constraints: The protocol requires the manipulation of the SYN signal and the insertion of start and stop bits in the serial words. The SYN signal is almost always redundant because the start and stop bits are sufficient to delimit the word transmissions. It is possible to implement a master and slave that appear to work perfectly and consistently with the protocol, and yet both ignore the SYN signal. A slave developed in another laboratory may ignore the start and stop bits and use the SYN signal instead. This slave will not work with the original master. Redundancy in the protocol's control signals invites incompatibility between devices built in different locations.

Mingled Data and Re-Program: The protocol assumes that an auxilliary device on the RCM will listen to communication on SDO and detect communication that is intended to re-program the main logic chip. In other words: the same communication channel is being used for data and re-programming. In theory, such a combination is possible, but in practice it is always difficut. Upgrading a Linux machine's operating system over the internet, for example, is possible, but only if nothing goes wrong. If the operating system fails to boot, we must sit down at the computer and work from the console. In other words: TCPIP is not sufficient for reliable re-programming of a Linux machine. In most cases where the logic on a circuit is re-programmable, there exists a separate communication channel through which it may be re-programmed. In ATLAS, for example, the front-end electronics logic is re-programmed through the slow controls system, which is separate from the front-end readout. If programming and normal data exchange are to be carried by the same signal, we must be certain that it is impossible to re-program the main logic chip in such a way that it corrupts communication between the TCM and the auxilliary chip. The protocol must provide two independent channels of communication on the same signal. No such provisions are made in the current version of the protocol.

These problems with the protocol, and other problems, lead us to the conclusion that its implementation is impractical. So the TCM prototypes will be equipped with the protocol we defined above.

Conclusion

[08-FEB-09] A TCM Prototype and RCM Emmulator are on their way to Harvard University for use with their RCM Prototypes. Although the RCM Prototypes are not expected until April, we wanted to allow the Harvard group time to get used to our data acquisition software before they use the TCM with a real RCM. Until then, they can communicate with our RCM Emmulator circuit.

Communication with the TCM over TCPIP is fully-developed and tested. We have adapted our established LWDAQ Software to control and read out the TCM Prototype. The software is available for Linux, Windows, and MacOS. A libraty of diagnostic routines for use with our software is also available. The A2101 Manual describes two variants of the TCM Prototype. The A2101A acts like a TCM with seven master sockets and a TCPIP interface. The A2101X variant acts like an RCM with one slave socket and its own RAM.

Communication between the TCM and the RCM using our Serial Protocol is fully-developed and tested. There are some bonus features we would like to implement in the long-run, but everything needed for control and readout of the RCMs is present and working.

Please do not hesitate to contact us for support with the TCM and RCM Emmulator. We will be glad to answer questions and help fix problems.