正则expression式从CSV中删除加倍的双引号

我有一个Excel表格,它有一个从SQL数据库的Python字典forms的一列中的大量数据。 我无权访问原始数据库,并且由于CSV每一行上的键/值不是相同的顺序,因此无法使用本地infile命令将CSV导入到sql。 当我将excel表导出为CSV时,我得到:

"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}" "{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}" 

在键/值周围删除“大括号和大括号之前和之后”的最佳方法是什么?

我也需要离开那些没有引号的整数。

我试图然后导入到python与json模块,以便我可以打印特定的键,但我不能导入他们与双重双引号。 我最终需要将数据保存在如下所示的文件中:

 {"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34} 

任何帮助是最感激!

如果input文件如上所示,并且您提到的尺寸较小,则可以将整个文件加载到内存中,进行replace,然后保存。 恕我直言,你不需要RegEx来做到这一点。 最简单的阅读代码是:

 with open(filename) as f: input= f.read() input= str.replace('""','"') input= str.replace('"{','{') input= str.replace('}"','}') with open(filename, "w") as f: f.write(input) 

我用样本inputtesting了它,它产生:

 {"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34} 

这正是你想要的。

如果你愿意,你也可以打包代码并写入

 with open(inputFilename) as if: with open(outputFilename, "w") as of: of.write(if.read().replace('""','"').replace('"{','{').replace('}"','}')) 

但我认为第一个更清晰,两者都完全一样。

简单:

text = re.sub(r'"(?!")', '', text)

给定input文件:TEST.TXT:

"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"

剧本:

 import re f = open("TEST.TXT","r") text_in = f.read() text_out = re.sub(r'"(?!")', '', text_in) print(text_out) 

产生以下输出:

{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}

这应该做到这一点:

 with open('old.csv') as old, open('new.csv', 'w') as new: new.writelines(re.sub(r'"(?!")', '', line) for line in old) 

我想你是在解决问题,为什么不replace数据呢?

 l = list() with open('foo.txt') as f: for line in f: l.append(line.replace('""','"').replace('"{','{').replace('}"','}')) s = ''.join(l) print s # or save it to file 

它产生:

 {"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34} 

使用list来存储中间行,然后调用.join来提高性能,正如在附加到string的好方法中所解释的那样

你可以实际使用csv模块和正则expression式来做到这一点:

 st='''\ "{""first_name"":""John"",""last_name"":""Smith"",""age"":30}" "{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"\ ''' import csv, re data=[] reader=csv.reader(st, dialect='excel') for line in reader: data.extend(line) s=re.sub(r'(\w+)',r'"\1"',''.join(data)) s=re.sub(r'({[^}]+})',r'\1\n',s).strip() print s 

打印

 {"first_name":"John","last_name":"Smith","age":"30"} {"first_name":"Tim","last_name":"Johnson","age":"34"}