How To Export Dataframe To_json In Append Mode - Python Pandas?
Solution 1:
No, you can't append to a json file without re-writing the whole file using pandas
or the json
module. You might be able to modify the file "manually" by opening the file in a
mode and seeking to the correct position and inserting your data. I wouldn't recommend this though. Better to just use a file format other than json if your file is going to be larger than your RAM.
This answer also might help. It doesn't create valid json files (instead each line is a json string), but its goal is very similar to yours.
Solution 2:
May be you need to think in terms of orient='records'
:
defto_json_append(df,file):
'''
Load the file with
pd.read_json(file,orient='records',lines=True)
'''
df.to_json('tmp.json',orient='records',lines=True)
#append
f=open('tmp.json','r')
k=f.read()
f.close()
f=open(file,'a')
f.write('\n') #Prepare next data entry
f.write(k)
f.close()
df=pd.read_json('output.json')
#Save again as lines
df.to_json('output.json',orient='records',lines=True)
#new data
df = pd.DataFrame({'a':pd.Series([1,2], index=list('CD')), \
"b":pd.Series([3,4], index=list('CD')})
#append:
to_json_append(df,'output.json')
To load full data
pd.read_json('output.json',orient='records',lines=True)
Solution 3:
I've solved it just by using in built pandas.DataFrame methods. You need to remember about the performance in case of huge dataframes (there are ways to deal with it). Code:
if os.path.isfile(dir_to_json_file):
# if exist open read it
df_read = pd.read_json(dir_to_json_file, orient='index')
# add data that you want to save
df_read = pd.concat([df_read, df_to_append], ignore_index=True)
# in case of adding to much unnecessery data (if you need)
df_read.drop_duplicates(inplace=True)
# save it to json file in AppData.bin
df_read.to_json(dir_to_json_file, orient='index')
else:
df_to_append.to_json(dir_to_json_file, orient='index')
Solution 4:
Usecase, write big amount of data to JSON file with small memory:
Let's say we have 1,000 dataframe, each dataframe is like 1000,000 line of json. Each dataframe needs 100MB, the total file size would be 1000 * 100MB = 100GB.
Answer :
- use buffer to store content of each dataframe
- use pandas to dump it to text
- use append mode to write text to the end of file
import io
import pandas as pd
from pathlib_mate import Path
n_lines_per_df = 10
n_df = 3
columns = ["id", "value"]
value = "alice@example.com"
f = Path(__file__).change(new_basename="big-json-file.json")
ifnot f.exists():
for nth_df inrange(n_df):
data = list()
for nth_line inrange(nth_df * n_lines_per_df, (nth_df + 1) * n_lines_per_df):
data.append((nth_line, value))
df = pd.DataFrame(data, columns=columns)
buffer = io.StringIO()
df.to_json(
buffer,
orient="records",
lines=True,
)
withopen(f.abspath, "a") as file:
file.write(buffer.getvalue())
Post a Comment for "How To Export Dataframe To_json In Append Mode - Python Pandas?"