Google Meet Media API - Technical Analysis for Recording Application¶
CRITICAL FINDING: Wrong Repository¶
The /Users/maro/Projects/codeseed/Google-Meet-Bot repository is NOT the official Google Meet Media API. It's a Python Selenium-based screen automation bot that uses sounddevice to capture system audio - completely different approach.
The official Google Meet Media API is: - GitHub: https://github.com/googleworkspace/meet-media-api-samples - Language: TypeScript (web) and C++ implementations - Status: Developer Preview (requires enrollment in Google Workspace Developer Preview Program)
1. TypeScript Reference Client Structure¶
Repository Organization (web/samples directory):¶
web/
├── samples/ # TypeScript reference implementation
│ ├── src/ # Source TypeScript files
│ ├── app.yaml # Google App Engine deployment config
│ ├── webpack # Build tool for bundling
│ └── package.json # Node.js dependencies
├── external/ # WebRTC dependencies
└── cpp/ # C++ reference client (alternative)
Key TypeScript Modules/Classes:¶
MeetMediaApiClient (Primary Interface)¶
interface MeetMediaApiClient {
// Connection management
joinMeeting(config: MeetMediaClientRequiredConfiguration): Promise<void>
leaveMeeting(): Promise<void>
// Layout/video assignment
createMediaLayout(request: MediaLayoutRequest): MediaLayout
applyLayout(layout: MediaLayout): Promise<void>
// Observable properties (reactive data streams)
mediaEntries: Subscribable<MediaEntry[]>
meetStreamTracks: Subscribable<MeetStreamTrack[]>
participants: Subscribable<Participant[]>
presenter: Subscribable<MediaEntry | undefined>
screenshare: Subscribable<MediaEntry | undefined>
sessionStatus: Subscribable<MeetSessionStatus>
}
MeetMediaClientRequiredConfiguration¶
interface MeetMediaClientRequiredConfiguration {
accessToken: string // OAuth token
meetingSpaceId: string // Meeting ID from Meet REST API
enableAudioStreams: boolean // Enable audio capture
numberOfVideoStreams: number // 1-3 video streams
logsCallback: (LogEvent) => void // Logging handler
}
MediaEntry (Participant Stream Data)¶
interface MediaEntry {
participant: BaseParticipant // Participant metadata
session: string // Session ID
sessionName: string
// Audio
audioMeetStreamTrack: MeetStreamTrack | undefined
audioMuted: boolean
// Video
videoMeetStreamTrack: MeetStreamTrack | undefined
videoMuted: boolean
// State
isPresenter: boolean
screenShare: boolean
mediaLayout: MediaLayout
}
MeetStreamTrack (Actual Media Stream)¶
interface MeetStreamTrack {
mediaEntry: MediaEntry
mediaStreamTrack: MediaStreamTrack // Standard WebRTC MediaStreamTrack
}
2. Audio/Video Stream Capture & Processing¶
WebRTC Architecture:¶
The API uses WebRTC with Virtual Streams managed by a Selective Forwarding Unit (SFU):
Audio Streams:¶
- Exactly 3 audio virtual streams (fixed requirement)
- Each stream has a static SSRC (Synchronization Source)
- CSRC headers identify the true speaker when streams switch
- Opus codec required (48kHz, with inband FEC)
- Dynamic speaker switching: SFU sends 3 loudest speakers
// Configure 3 audio transceivers in offer
pc.addTransceiver('audio', {'direction':'recvonly'});
pc.addTransceiver('audio', {'direction':'recvonly'});
pc.addTransceiver('audio', {'direction':'recvonly'});
Video Streams:¶
- 1-3 video virtual streams (configurable)
- VP8, VP9, and AV1 codecs required
- Dynamic participant switching based on relevance
- Can request specific participants via Video Assignment API
// Configure video transceivers
pc.addTransceiver('video', {'direction':'recvonly'});
pc.addTransceiver('video', {'direction':'recvonly'});
pc.addTransceiver('video', {'direction':'recvonly'});
Stream Processing Flow:¶
- WebRTC Offer/Answer: Client creates SDP offer → Send to
connectActiveConference()→ Receive SDP answer - ICE Connection: Establish peer connection to Meet servers
- Data Channels: Open ordered data channels (
session-control,media-stats,participants,media-entries,video-assignment) - Media Arrival: Subscribe to
meetStreamTracksforMediaStreamTrackobjects - Extract Frames: Use
MediaStreamTrack.getSettings()and attach to<video>or process withMediaRecorder
3. Participant Tracking & Speaker Identification¶
Participant Data Structure:¶
interface BaseParticipant {
name: string // Resource name
participantKey: string // Unique participant ID
// User identity (one of):
signedInUser?: {
user: string // Email/user ID
displayName: string
}
anonymousUser?: {
displayName: string
}
phoneUser?: {
displayName: string
}
}
CSRC-Based Speaker Identification:¶
- CSRC (Contributing Source): 32-bit identifier in RTP packet headers
- Unique per participant, constant during their session
- Maps to participant via
MediaEntry.participant
// MediaEntry contains CSRC mapping
interface MediaEntry {
participant: BaseParticipant // Links to participant metadata
audioCsrc?: number // Audio CSRC value
videoCsrcs?: number[] // Video CSRC values
audioMuted: boolean
videoMuted: boolean
}
Real-time Tracking:¶
// Subscribe to participant updates
client.participants.subscribe((participants: Participant[]) => {
participants.forEach(p => {
console.log('Participant:', p.participant.displayName);
p.mediaEntries.forEach(entry => {
console.log(' CSRC:', entry.audioCsrc, 'Muted:', entry.audioMuted);
});
});
});
// Subscribe to media entries (active streams)
client.mediaEntries.subscribe((entries: MediaEntry[]) => {
entries.forEach(entry => {
if (entry.audioMeetStreamTrack) {
// This participant is currently speaking (one of 3 loudest)
console.log('Active speaker:', entry.participant.displayName);
}
});
});
4. WebRTC Data Channel Architecture¶
Required Data Channels (all must be ordered):¶
session-control¶
- Purpose: Session lifecycle management
- Messages: Leave requests/responses, session status updates
media-stats¶
- Purpose: Upload WebRTC statistics for diagnostics
- Configuration: Server sends upload interval and allowlist
participants (Server-initiated)¶
- Purpose: Receive participant join/leave events
- Messages:
ParticipantsChannelToClientwith participant resources
media-entries (Server-initiated)¶
- Purpose: Receive media stream assignments
- Messages:
MediaEntriesChannelToClientwith media entry resources
video-assignment (Optional)¶
- Purpose: Request specific participants' video
- Messages:
SetVideoAssignmentRequestwith canvas assignments
Data Channel Protocol:¶
- Format: JSON-encoded protobuf-like structures
- Bidirectional: Client sends requests, server sends resources
- Resource Snapshots: Full state + incremental deltas
5. Incremental Recording vs Full Download¶
Current API Limitations:¶
The Meet Media API does NOT provide built-in incremental S3 upload. You must implement this yourself using:
Approach 1: MediaRecorder API (Browser)¶
const mediaRecorder = new MediaRecorder(mediaStream, {
mimeType: 'video/webm;codecs=vp9,opus',
videoBitsPerSecond: 2500000
});
let chunkNumber = 0;
mediaRecorder.ondataavailable = async (event) => {
if (event.data.size > 0) {
// Upload chunk to MinIO incrementally
await uploadToMinIO(event.data, `recording-chunk-${chunkNumber++}.webm`);
}
};
// Request data every 5 seconds
mediaRecorder.start(5000);
Approach 2: Custom Frame Processing¶
// Process raw video frames
const videoTrack = mediaStreamTrack as MediaStreamVideoTrack;
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const reader = processor.readable.getReader();
while (true) {
const {value: frame, done} = await reader.read();
if (done) break;
// Extract frame data, encode, upload to MinIO
const imageData = await extractFrameData(frame);
await uploadFrameToMinIO(imageData);
frame.close();
}
Approach 3: Server-Side Recording (Recommended)¶
Run a Node.js/C++ server that: 1. Joins meeting using Meet Media API 2. Receives WebRTC streams 3. Uses FFmpeg to mux streams into containers 4. Streams chunks to MinIO S3 using multipart upload
// Node.js example with FFmpeg
import ffmpeg from 'fluent-ffmpeg';
import { S3Client, CreateMultipartUploadCommand } from '@aws-sdk/client-s3';
const command = ffmpeg()
.input(audioStream)
.input(videoStream)
.outputOptions([
'-f segment',
'-segment_time 10',
'-reset_timestamps 1'
])
.on('end', () => console.log('Recording complete'))
.pipe();
// Stream to MinIO
command.on('data', async (chunk) => {
await s3Client.uploadPart({
Bucket: 'recordings',
Key: `meeting-${meetingId}.mp4`,
PartNumber: partNumber++,
Body: chunk
});
});
6. Authentication Flow¶
OAuth 2.0 Requirements:¶
Scopes:
- https://www.googleapis.com/auth/meetings.conference.media.readonly
- https://www.googleapis.com/auth/meetings.space.readonly
Flow:¶
- Get OAuth Token: Use implicit grant flow (web) or service account (server)
- Create/Get Meeting Space: Use Meet REST API to get meeting space ID
- Join Conference: Call
connectActiveConference()with OAuth token in request - Maintain Token: Refresh before expiry (typically 1 hour)
TypeScript Example:¶
// 1. OAuth (implicit grant for web)
const oauth2Endpoint = 'https://accounts.google.com/o/oauth2/v2/auth';
const params = {
client_id: YOUR_CLIENT_ID,
redirect_uri: YOUR_REDIRECT_URI,
response_type: 'token',
scope: 'https://www.googleapis.com/auth/meetings.conference.media.readonly ' +
'https://www.googleapis.com/auth/meetings.space.readonly'
};
// 2. Get meeting space ID
const meetingCode = 'abc-defg-hij';
const response = await fetch(
`https://meet.googleapis.com/v2/spaces?filter=meetingCode="${meetingCode}"`,
{headers: {Authorization: `Bearer ${accessToken}`}}
);
const {name} = await response.json(); // spaces/XYZ
// 3. Connect to conference
const client = new MeetMediaApiClient();
await client.joinMeeting({
accessToken: accessToken,
meetingSpaceId: name,
enableAudioStreams: true,
numberOfVideoStreams: 3,
logsCallback: (log) => console.log(log)
});
7. Important Gotchas & Limitations¶
🚨 Critical Limitations:¶
- Developer Preview Only
- Requires enrollment in Google Workspace Developer Preview Program
- All participants must also be enrolled
-
Not production-ready
-
Consumer Meeting Restrictions
- For Gmail (@gmail.com) meetings, the meeting organizer must be present to consent
-
Bot connection rejected if organizer leaves
-
Security Restrictions
- Cannot join encrypted meetings
- Cannot join meetings with watermarks
-
Cannot join if underage accounts present
-
Virtual Stream Caps
- Audio: Exactly 3 streams (no more, no less)
- Video: 1-3 streams maximum
-
No access to all participants simultaneously in large meetings
-
CORS Requirements
- Web client must be deployed (localhost won't work)
- Must use HTTPS
-
Recommended: Deploy to Google App Engine
-
Codec Requirements
- Must support: Opus (audio), VP8/VP9/AV1 (video)
-
H.264 supported but not required
-
No Built-in Recording
- API provides raw streams only
- Must implement own recording/muxing/upload logic
-
No native support for multi-speaker unmixing
-
Participant Switching
- Active speakers change dynamically
- CSRC values must be tracked to identify who's speaking
-
No guarantee all participants appear in streams
-
Data Channel Ordering
-
All data channels must be ordered (unordered not supported)
-
Meeting Space Lifetime
- Meeting codes expire after 365 days of inactivity
- Must use meeting space resource name, not just code
8. Recommended Architecture for Your App¶
Based on your requirements (TypeScript, MinIO S3, speaker tracking, Docker):
Architecture:¶
┌─────────────────────────────────────────────────────────┐
│ Frontend (TypeScript/React) │
│ - OAuth authentication │
│ - Meeting code input │
│ - Recording controls │
│ - Real-time participant display │
└─────────────────┬───────────────────────────────────────┘
│ REST API
┌─────────────────▼───────────────────────────────────────┐
│ Backend (Node.js/TypeScript + Express) │
│ - Manage OAuth tokens │
│ - Create recording sessions │
│ - Route to recording workers │
└─────────────────┬───────────────────────────────────────┘
│ Job Queue
┌─────────────────▼───────────────────────────────────────┐
│ Recording Worker (Node.js + WebRTC) │
│ - Join meeting using Meet Media API │
│ - Capture 3 audio + 3 video streams │
│ - Track participants via CSRC │
│ - Mux streams using FFmpeg │
│ - Upload chunks to MinIO (multipart) │
└─────────────────┬───────────────────────────────────────┘
│ S3 Protocol
┌─────────────────▼───────────────────────────────────────┐
│ MinIO S3 Storage │
│ - Bucket: recordings │
│ - Structure: /{meetingId}/{timestamp}-chunk-{n}.mp4 │
│ - Metadata: speaker CSRC mappings │
└─────────────────────────────────────────────────────────┘
Key TypeScript Classes:¶
// recording-worker.ts
class RecordingWorker {
private client: MeetMediaApiClient;
private recorder: StreamRecorder;
private s3Uploader: S3ChunkUploader;
private participantTracker: ParticipantTracker;
async startRecording(meetingSpaceId: string, accessToken: string) {
await this.client.joinMeeting({accessToken, meetingSpaceId, ...});
// Subscribe to streams
this.client.meetStreamTracks.subscribe(tracks => {
tracks.forEach(track => this.recorder.addTrack(track.mediaStreamTrack));
});
// Track speakers
this.client.mediaEntries.subscribe(entries => {
this.participantTracker.updateSpeakers(entries);
});
// Start recording and upload
this.recorder.start((chunk, metadata) => {
this.s3Uploader.uploadChunk(chunk, metadata);
});
}
}
// participant-tracker.ts
class ParticipantTracker {
private csrcToParticipant = new Map<number, BaseParticipant>();
updateSpeakers(entries: MediaEntry[]) {
entries.forEach(entry => {
if (entry.audioCsrc) {
this.csrcToParticipant.set(entry.audioCsrc, entry.participant);
}
});
}
getSpeaker(csrc: number): BaseParticipant | undefined {
return this.csrcToParticipant.get(csrc);
}
}
// s3-chunk-uploader.ts
import { S3Client, CreateMultipartUploadCommand, UploadPartCommand } from '@aws-sdk/client-s3';
class S3ChunkUploader {
async uploadChunk(chunk: Blob, metadata: RecordingMetadata) {
const partNumber = this.getNextPartNumber();
await this.s3Client.send(new UploadPartCommand({
Bucket: 'recordings',
Key: `${metadata.meetingId}/${metadata.timestamp}.mp4`,
PartNumber: partNumber,
Body: Buffer.from(await chunk.arrayBuffer()),
Metadata: {
speakers: JSON.stringify(metadata.activeSpeakers)
}
}));
}
}
Summary & Recommendations¶
✅ What You Can Do:¶
- Real-time access to 3 audio + 1-3 video streams
- Track active speakers via CSRC headers
- Implement incremental S3 upload using multipart upload
- Build TypeScript full-stack with Node.js backend
- Deploy in Docker containers
❌ What You Cannot Do (with current API):¶
- Access all participants simultaneously (limited to 3 audio, 3 video)
- Use in production without Developer Preview enrollment
- Join consumer meetings without organizer present
- Record without implementing custom muxing/storage
🎯 Recommended Next Steps:¶
- Enroll in Google Workspace Developer Preview Program
- Clone official repo:
git clone https://github.com/googleworkspace/meet-media-api-samples - Study
web/samplesTypeScript reference client - Build POC with single stream → MinIO upload
- Implement speaker tracking with CSRC mapping
- Add FFmpeg-based muxing for multi-stream recording
- Integrate with STT service (feed audio chunks directly)
📚 Essential Resources:¶
- API Concepts: https://developers.google.com/workspace/meet/media-api/guides/concepts
- TypeScript Quickstart: https://developers.google.com/workspace/meet/media-api/guides/ts
- GitHub Samples: https://github.com/googleworkspace/meet-media-api-samples
- WebRTC Primer: https://webrtcforthecurious.com/
- Meet REST API: https://developers.google.com/workspace/meet/api/guides/overview
- Virtual Streams: https://developers.google.com/workspace/meet/media-api/guides/virtual-streams
- Data Channels Reference: https://developers.google.com/workspace/meet/media-api/reference/dc/media_api
Date¶
Created: December 16, 2025
Project Context¶
This analysis is for building a Google Meet recording application with: - TypeScript full-stack architecture - Stream recordings to MinIO S3 incrementally - Track speaker identification - Docker deployment - Future integration with Speech-to-Text (STT)