C用换行符分析逗号分隔值

我有一个CSV数据文件,其中包含以下数据:

 H1,H2,H3 a,"b c d",e 

当我通过Excel打开CSV文件时,它可以显示列标题为H1, H2, H3和列值的表格: a for H1

 multi line value as b c d for H2 

c for H3我需要parsing这个文件使用C程序,并有这样的价值拾起。 但是,我的下面的代码片段将无法正常工作,因为我有一个列的多行值:

 char buff[200]; char tokens[10][30]; fgets(buff, 200, stdin); char *ptok = buff; // for iterating char *pch; int i = 0; while ((pch = strchr(ptok, ',')) != NULL) { *pch = 0; strcpy(tokens[i++], ptok); ptok = pch+1; } strcpy(tokens[i++], ptok); 

如何修改这个代码片段以适应列的多行值? 请不要被string缓冲区的硬编码值所困扰,这是POC的testing代码。 我不想用任何第三方图书馆,而是想从第一原则出发, 请帮忙。

在C语言中parsing“格式良好”的CSV的主要难点在于正在使用固定长度的string和数组来避免可变长度string和数组的处理。 (其他的复杂是处理不正确的CSV。)

如果没有这些复杂性,parsing是非常简单的:

(另)

 /* Appends a non-quoted field to s and returns the delimiter */ int readSimpleField(struct String* s) { for (;;) { int ch = getc(); if (ch == ',' || ch == '\n' || ch == EOF) return ch; stringAppend(s, ch); } } /* Appends a quoted field to s and returns the delimiter. * Assumes the open quote has already been read. * If the field is not terminated, returns ERROR, which * should be a value different from any character or EOF. * The delimiter returned is the character after the closing quote * (or EOF), which may not be a valid delimiter. Caller should check. */ int readQuotedField(struct String* s) { for (;;) { int ch; for (;;) { ch = getc(); if (ch == EOF) return ERROR; if (ch == '"') { ch = getc(); if (ch != '"') break; } stringAppend(s, ch); } } } /* Reads a single field into s and returns the following delimiter, * which might be invalid. */ int readField(struct String* s) { stringClear(s); int ch = getc(); if (ch == '"') return readQuotedField(s); if (ch == '\n' || ch == EOF) return ch; stringAppend(s, ch); return readSimpleField(s); } /* Reads a single row into row and returns the following delimiter, * which might be invalid. */ int readRow(struct Row* row) { struct String field = {0}; rowClear(row); /* Make sure there is at least one field */ int ch = getc(); if (ch != '\n' && ch != EOF) { ungetc(ch, stdin); do { ch = readField(s); rowAppend(row, s); } while (ch == ','); } return ch; } /* Reads an entire CSV file into table. * Returns true if the parse was successful. * If an error is encountered, returns false. If the end-of-file * indicator is set, the error was an unterminated quoted field; * otherwise, the next character read will be the one which * triggered the error. */ bool readCSV(struct Table* table) { tableClear(table); struct Row row = {0}; /* Make sure there is at least one row */ int ch = getc(); if (ch != EOF) { ungetc(ch, stdin); do { ch = readRow(row); tableAppend(table, row); } while (ch == '\n'); } return ch == EOF; } 

以上是“从第一原则” – 它甚至不使用标准的C库string函数。 但是需要一些努力来理解和validation。 就个人而言,我会使用(f)lex,甚至yacc / bison(尽pipe有点矫枉过正)来简化代码并使预期的语法更加明显。 但是在C中处理变长结构仍然是第一步。