正则expression式从CSV中删除加倍的双引号

我有一个Excel表格，它有一个从SQL数据库的Python字典forms的一列中的大量数据。我无权访问原始数据库，并且由于CSV每一行上的键/值不是相同的顺序，因此无法使用本地infile命令将CSV导入到sql。当我将excel表导出为CSV时，我得到：

"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}" "{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"

在键/值周围删除“大括号和大括号之前和之后”的最佳方法是什么？

我也需要离开那些没有引号的整数。

我试图然后导入到python与json模块，以便我可以打印特定的键，但我不能导入他们与双重双引号。我最终需要将数据保存在如下所示的文件中：

 {"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34}

任何帮助是最感激！

如果input文件如上所示，并且您提到的尺寸较小，则可以将整个文件加载到内存中，进行replace，然后保存。恕我直言，你不需要RegEx来做到这一点。最简单的阅读代码是：

 with open(filename) as f: input= f.read() input= str.replace('""','"') input= str.replace('"{','{') input= str.replace('}"','}') with open(filename, "w") as f: f.write(input)

我用样本inputtesting了它，它产生：

 {"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34}

这正是你想要的。

如果你愿意，你也可以打包代码并写入

 with open(inputFilename) as if: with open(outputFilename, "w") as of: of.write(if.read().replace('""','"').replace('"{','{').replace('}"','}'))

但我认为第一个更清晰，两者都完全一样。

简单：

`text = re.sub(r'"(?!")', '', text)`

给定input文件：TEST.TXT：

"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"

剧本：

 import re f = open("TEST.TXT","r") text_in = f.read() text_out = re.sub(r'"(?!")', '', text_in) print(text_out)

产生以下输出：

{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}

这应该做到这一点：

 with open('old.csv') as old, open('new.csv', 'w') as new: new.writelines(re.sub(r'"(?!")', '', line) for line in old)

我想你是在解决问题，为什么不replace数据呢？

 l = list() with open('foo.txt') as f: for line in f: l.append(line.replace('""','"').replace('"{','{').replace('}"','}')) s = ''.join(l) print s # or save it to file

它产生：

 {"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34}

使用list来存储中间行，然后调用.join来提高性能，正如在附加到string的好方法中所解释的那样

你可以实际使用csv模块和正则expression式来做到这一点：

 st='''\ "{""first_name"":""John"",""last_name"":""Smith"",""age"":30}" "{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"\ ''' import csv, re data=[] reader=csv.reader(st, dialect='excel') for line in reader: data.extend(line) s=re.sub(r'(\w+)',r'"\1"',''.join(data)) s=re.sub(r'({[^}]+})',r'\1\n',s).strip() print s

打印

 {"first_name":"John","last_name":"Smith","age":"30"} {"first_name":"Tim","last_name":"Johnson","age":"34"}

正则expression式从CSV中删除加倍的双引号

`text = re.sub(r'"(?!")', '', text)`

sorting出口到Excel的字典？

按数字键sorting字典

使用vba中的自定义对象键访问字典中的项目

访问VBA-Excel中的字典数组

Dictionaryparsingpython

Python – 用csv.DictReader忽略len（）中的空单元格

以易于编辑和读取的方式导入和导出嵌套字典到Excel

VBA中的嵌套字典：错误457：此键已经与集合元素关联

Excel VBA：将CompareMode设置为TextCompare与“Dictionary”时，“无效过程调用或方法”

按值键入字典并返回键的返回列表