Overview
Features
- Convert Excel/CSV/XML/YAML to JSON/Text/Bin.
- Use Protobuf to define the structure of Excel/CSV/XML/YAML.
- Use Golang to develop the conversion engine.
- Support multiple programming languages, thanks to Protobuf (proto3).
Concepts
- Importer:
- imports a Excel/CSV file to a in-memory book of Table sheets.
- imports a XML/YAML file to a in-memory book of Document sheets.
- Parsers:
- protogen: converts Excel/CSV/XML/YAML files to Protoconf files.
- confgen: converts Excel/CSV/XML/YAML with Protoconf files to JSON/Text/Bin files.
- Exporter:
- protogen: exports a tableau.Workbook to a proto file.
- confgen: exports a protobuf message to a JSON/Text/Bin file.
- Protoconf: a dialect of Protocol Buffers (proto3) extended with tableau options, aimed to define the structure of Excel/CSV/XML/YAML.
Workflow
flowchart TD
subgraph Input
I1(Excel)
click I1 href "https://github.com/qax-os/excelize"
I2(CSV)
click I2 href "https://pkg.go.dev/encoding/csv"
I3(XML)
click I3 href "https://github.com/antchfx/xmlquery"
I4(YAML)
click I4 href "https://github.com/go-yaml/yaml"
end
Input --> I[importer]
click I href "https://github.com/tableauio/tableau/tree/master/internal/importer"
subgraph Protogen
PGP[protogen.Parser] --> E1[protogen.Exporter]
click PGP href "https://github.com/tableauio/tableau/tree/master/internal/protogen/parser.go"
click E1 href "https://github.com/tableauio/tableau/tree/master/internal/protogen/exporter.go"
end
I --> Protogen:::orangeclass
subgraph Confgen
CGP[confgen.Parser] --> CE[confgen.Exporter]
click CGP href "https://github.com/tableauio/tableau/tree/master/internal/confgen/parser.go"
click CE href "https://github.com/tableauio/tableau/tree/master/internal/confgen/mexporter"
end
I --> Confgen:::orangeclass
Protogen --> B
B{{Protoconf}}:::greenclass --> | protobuf descriptor | Confgen
subgraph Output
O1("JSON")
click O1 href "https://developers.google.com/protocol-buffers/docs/proto3#json"
O2("Text")
click O2 href "https://developers.google.com/protocol-buffers/docs/text-format-spec"
O3("Bin")
click O3 href "https://developers.google.com/protocol-buffers/docs/encoding"
end
Confgen --> Output
classDef orangeclass fill:#f96;
classDef greenclass fill:#40E0D0;
Types
- Scalar
- Message(struct)
- List
- Map(unordered)
- Timestamp
- Duration
TODO
protoc plugins
- Golang
- C++
- C#/.NET
- Python
- Lua
- Javascript/Typescript/Node
- Java
Metadata
- metatable: a message to describe the worksheet’s metadata.
- metafield: a message to describe the caption’s metadata.
- captrow: caption row, the exact row number of captions at worksheet. Newline in caption is allowed for more readability, and will be trimmed in conversion.
- descrow: description row, the exact row number of descriptions at worksheet.
- datarow: data row, the start row number of data.
Newline(line break) in major operating systems:
OS | Abbreviation | Escape sequence |
---|---|---|
Unix (linux, OS X) | LF | \n |
Microsoft Windows | CRLF | \r\n |
classic Mac OS/OS X | CR | \r |
LF: Line Feed, CR: Carriage Return.
Generator
- generate protoconf by Excel(header): Excel -> protoconf
- generate Excel(header) by protoconf: protoconf -> Excel
Conversion
- Excel -> JSON(default format and human readable)
- Excel -> protowire(small size)
- Excel -> prototext(human debugging)
- JSON -> Excel
- protowire -> Excel
- prototext -> Excel
Pretty Print
- Multiline: every textual element on a new line
- Indent: 4 space characters
- JSON support
- prototext support
EmitUnpopulated
- JSON:
EmitUnpopulated
specifies whether to emit unpopulated fields.
Scalar Types
- interger: int32, uint32, int64 and uint64
- float: float and double
- bool
- string
- bytes
- datetime, date, time, duration
Enumerations
- enum: The Parser accepts three enum value forms:
- enum value number
- enum value name
- enum value alias name (with EnumValueOptions specified)
- enum: validate the enum value.
Composite Types
- message: horizontal(row direction) layout, fields located in cells.
- message: simple in-cell message, each field must be scalar type. It is a comma-separated list of fields. E.g.:
1,test,3.0
. List’s size need not to be equal to fields' size, as fields will be filled in order. Fields not configured will be filled default values due to its scalar type. - list: horizontal(row direction) layout, which is list’s default layout, and each item can be message or scalar.
- list: vertical(column direction) layout. and each item should be message.
- list: simple in-cell list, element must be scalar type. It is a comma-separated list of elements. E.g.:
1,2,3
. - list: scalable or dynamic list size.
- list: smart recognition of empty element at any position.
- list
- In-cell struct list: no need to support
- Cross-cell horizontal scalar/enum list
- Cross-cell horizontal incell-struct list
- Cross-cell vertical scalar list: no need to support, use this:
[Item]int32
- Cross-cell vertical incell-struct list
- list size
- dynamic size: items should be present continuously, and report error if empty item is inserted.
- fixed size
- map: horizontal(row direction) layout.
- map: vertical(column direction) layout, and is map’s default layout.
- map: unordered-map or hash-map.
- map: ordered-map supported by tableauio/loader.
- C++
- Golang
- C#
- map: simple in-cell map, both key and value must be scalar type. It is a comma-separated list of
key:value
pairs. E.g.:1:10,2:20,3:30
. - map: scalable or dynamic map size.
- map: smart recognition of empty value at any position.
- map
- Cross-cell horizontal scalar map: no need to support, use this:
map<int32, Item>
- Cross-cell vertical scalar map: : no need to support, use this:
map<int32, Item>
- Cross-cell horizontal scalar map: no need to support, use this:
- map size
- dynamic size: items should be present continuously, and report error if empty item is inserted.
- fixed size
- nesting: unlimited nesting of message, list, and map.
- nesting: the composite type’s first element can be composite type.
Default Values
Each scalar type’s default value is same as protobuf.
- interger:
0
- float:
0.0
- bool:
false
- string:
""
- bytes:
""
- in-cell message: each field’s default value is same as protobuf
- in-cell list: element’s default value is same as protobuf
- in-cell map: both key and value’s default value are same as protobuf
- message: all fields have default values
Empty
- scalar: default value same as protobuf.
- message: empty message will not be spawned if all fields are empty.
- list: empty list will not be spawned if list’s size is 0.
- list: empty message will not be appended if list’s element(message type) is empty.
- map: empty map will not be spawned if map’s size is 0.
- map: empty message will not be inserted if map’s value(message type) is empty.
- nesting: recursively empty.
Merge
- merge multiple workbooks
- merge multiple worksheets
Workbook meta
workbook meta sheet @TABLEAU:
- specify which sheets to be parsed
- specify parser options for each sheet
Sheet | Alias | Nameline | Typeline |
---|---|---|---|
Sheet1 | ExchangeInfo | 2 | 2 |
Datetime
Understanding about RFC 3339 for Datetime and Timezone Formatting in Software Engineering
2019-10-12T07:20:50.52Z # This is acceptable in ISO 8601 and RFC 3339 (with T)
2019-10-12 07:20:50.52Z # This is only accepted in RFC 3339 (without T)
- “Z” stands for Zero timezone or Zulu timezone
UTC+0
, and equal to+08:00
in the RFC 3339.- RFC 3339 follows the ISO 8601 DateTime format. The only difference is RFC allows us to replace “T” with “space”.
Use RFC 3339 , which is following ISO 8601.
- Timestamp: based on
google.protobuf.Timestamp
, see JSON mapping - Timezone: see ParseInLocation
- DST: Daylight Savings Time. There is no plan to handle this boring stuff.
- Datetime: excel format:
yyyy-MM-dd HH:mm:ss
, e.g.:2020-01-01 05:10:00
- Date: excel format:
yyyy-MM-dd
oryyMMdd
, e.g.:2020-01-01
or20200101
- Time: excel format:
HH:mm:ss
orHHmmss
, e.g.:05:10:00
or051000
- Duration: based on
google.protobuf.Duration
, see JSON mapping - Duration: excel format:
form "72h3m0.5s"
, see golang duration string form
Transpose
- Interchange the rows and columns of a worksheet.
Validation
- unique: check map key uniqueness.
- range:
[left, right]
. - refer:
XXXConf.ID
. To be supported by tableauio/loader.
Error Message
- Report clear and precise error messages when converter failed, please refer to the programming language compiler
- Use golang template to define error message template
- Multiple languages support, focused on English and Simplified Chinese
Performace
- Stress test
- Each goroutine process one worksheet
- Mutiple process model